Using the method of pairwise comparison to obtain reliable teacher assessments

Heldsinger, Sandra; Humphry, Stephen

doi:10.1007/BF03216919

Using the method of pairwise comparison to obtain reliable teacher assessments

Published: August 2010

Volume 37, pages 1–19, (2010)
Cite this article

The Australian Educational Researcher Aims and scope Submit manuscript

Sandra Heldsinger¹ &
Stephen Humphry¹

848 Accesses
64 Citations
4 Altmetric
Explore all metrics

Abstract

Demands for accountability have seen the implementation of large scale testing programs in Australia and internationally. There is, however, a growing body of evidence to show that externally imposed testing programs do not have a sustained impact on student achievement. It has been argued that teacher assessment is more effective in raising student achievement levels. However, it is also often argued that teacher assessments are less reliable than the results of testing programs. This paper presents a study in which teachers judged writing scripts using the process of pairwise comparison to generate a scale. The analysis showed high internal consistency of the teacher judgements. The scale locations from pairwise comparisons were highly correlated with scale estimates for the same students from a large-scale testing program. The results demonstrate it is possible to efficiently obtain highly reliable and valid teacher judgements using the process of pairwise comparison. Reliability indices are also provided for a series of small-scale assessments that used the same methodology in a range of other domains. The results support the findings of the main study. The article discusses the benefits of using the method to supplement and validate results from large-scale testing programs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Andrich, D. (1978a). Relationships between the Thurstone and Rasch approaches to item scaling.Applied Psychological Measurement, 2(3), 449–460.
Article Google Scholar
Andrich, D. (1978b). A rating formulation for ordered response categories.Psychometrika, 43, 561–73.
Article Google Scholar
Andrich, D. (1988).Rasch models for measurement. Beverly Hills: Sage Publications.
Google Scholar
Andrich, D. (2006).A report to the Curriculum Council regarding assessment for tertiary selection. Perth: Curriculum Council of Western Australia. [Available from: www.curriculum.wa.edu.au/internet/_Documents/Publications/Andrich+Report.pdf].
Google Scholar
Andrich, D., & Luo, G. (2003). Conditional Pairwise estimation in the Rasch model for ordered response categories using principle components.Journal of Applied Measurement, 4, 205–221.
Google Scholar
Black, P., & Wiliam, D. (1998). Assessment and classroom learning.Assessment in Education: Principles, Policy & Practice, 5(1), 7–74.
Article Google Scholar
Bock, D. (1997). A brief history of item response theory.Educational Measurement: Issues and Practice, 16, 21–33.
Article Google Scholar
Bond, T., & Caust, M. (2005, November).Silk purses from sows’ ears? Making measures from teacher judgements. Paper presented at the Australian Association for Research in Education Conference, Sydney [published January 2006].
Bond, T. G., & Fox, C. M. (2001).Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum.
Google Scholar
Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs, I. The method of paired comparisons.Biometrika, 39, 324–345.
Google Scholar
Bramley, T., Bell, J. F., & Pollitt, A. (1998). Assessing changes in standards over time using Thurstone’s paired comparisons.Education Research and Perspectives, 2, 1–23.
Google Scholar
Brookhart, S. M. (2003). Developing measurement theory for classroom assessment purposes and uses.Educational Measurement: Issues and Practice, 22(4), 5–12
Article Google Scholar
Chudowsky, N., & Pellegrino, J. W. (2003). Large-scale assessments that support learning: what will it take?Theory into Practice, 42(1), 75–83.
Article Google Scholar
Clarke, S., & Gipps, C. (2000). The role of teachers in teacher assessment in England, 1996–1998,Evaluation and Research in Education, 14(1), 38–52.
Article Google Scholar
Department of Education and Training, Western Australia (1997).First Steps Writing Developmental Continuum. Richmond, Australia: Heinemann.
Google Scholar
Gregory, K., & Clarke, M. (2003). High-stakes assessment in England and Singapore.Theory into Practice, 42(1), 66–74.
Article Google Scholar
Groves, P. (2002). “Doesn’t it feel morbid here?” High stakes testing and the widening of the equity gap.Educational Foundations, 16(2), 15–31.
Google Scholar
Gunzenhauser, M. (2003). High-stakes testing and the default philosophy of education.Theory into Practice, 42(1), 51–58.
Article Google Scholar
Holme, B., & Humphry, S.M. (2008).PairWise software. Perth: University of Western Australia.
Google Scholar
Louden, B., Chapman, E., Clarke, S., Cullity, M., & House, H. (2006).Evaluation of the Curriculum Improvement Program Phase 2. Report for the Department of Education and Training prepared in the Graduate School of Education, University of Western Australia. Accessed January 10, 2009, from http://www.det.wa.edu.au/education/ accountability/docs/curriculumreport.pdf
Luce, R. D. (1959).Individual Choice Behaviours: A theoretical analysis. New York: J. Wiley.
Google Scholar
Luke, A., & Woods, A. (2007). Learning lessons: What No Child Left Behind can teach us about literacy, testing and accountability.QTU Professional Magazine, November, 5–9.
Masters, G. N. (1982). A Rasch model for partial credit scoring.Psychometrika, 47, 149–174.
Article Google Scholar
Ministerial Council for Education, Employment, Training and Youth Affairs (2008).National declaration on educational goals for young Australians. Retrieved December 12, 2008, from http://www.mceetya.edu.au/mceetya/natgoals,24767.html
Performance Measurement Review Taskforce.A paper about the benefits of participating in national assessments. Retrieved December 12, 2008, from http://www.curriculum .edu.au/verve/_resources/Benefits_of_participation_in_national_assessments1.pdf
Rasch, G. (1961/1980). On General Laws and the Meaning of Measurement in Psychology. In J. Neyman (Ed.),Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Volume 4: Contributions to Biology and Problems of Medicine, pp. 321–333. Berkeley: University of Chicago Press. [Available from: http://projecteuclid.org/DPubS?service=UI&version=1.0&verb=Display&handle=euclid.bsmsp/1200512872]
Google Scholar
Shepard, L. A. (2003). The hazards of high stakes testing.Issues in Science and Technology, 19(2), 53–58
Google Scholar
Sloane, F. C., & Kelly, A. E. (2003). Issues in high-stakes testing programs.Theory into Practice, 42(1), 12.
Article Google Scholar
Smith, A. B., Rush, R., Fallowfield, L. J., Velikova, G., & Sharpe, M. (2008). Rasch fit statistics and sample size considerations for polytomous data.Medical Research Methodology, 8, 33.
Article Google Scholar
Stiggins, R. J. (2001).The unfulfilled promise of classroom assessment.Educational Measurement: Issues and Practice, 20(3) 5–15.
Article Google Scholar
Thurstone, L. L. (1927). A law of comparative judgement.Psychological Review, 34, 278–286.
Google Scholar
Thurstone, L. L. (1928). Attitudes can be measured.American Journal of Sociology, 33, 529–54.
Article Google Scholar
Thurstone, L. L (1959).The measurement of values. Chicago, USA: The University of Chicago Press.
Google Scholar
Wright, B. D., & Masters, G. N. (1982).Rating scale analysis. Chicago: MESA Press.
Google Scholar
Wright, B. D., & Stone, M. H. (1979).Best test design. Chicago, IL: MESA Press.
Google Scholar
Wyatt-Smith, C. (2000). Exploring the relationship between large-scale literacy testing programs and classroom-based assessment: A focus on teachers’ accounts.Australian Journal of Language and Literacy, 23(2), 109–127.
Google Scholar

Download references

Author information

Authors and Affiliations

The University of Western Australia, Australia
Sandra Heldsinger & Stephen Humphry

Authors

Sandra Heldsinger
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Humphry
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Heldsinger, S., Humphry, S. Using the method of pairwise comparison to obtain reliable teacher assessments. Aust. Educ. Res. 37, 1–19 (2010). https://doi.org/10.1007/BF03216919

Download citation

Issue Date: August 2010
DOI: https://doi.org/10.1007/BF03216919

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using the method of pairwise comparison to obtain reliable teacher assessments

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using the method of pairwise comparison to obtain reliable teacher assessments

Abstract

Access this article

Similar content being viewed by others

The Use of Cronbach’s Alpha When Developing and Reporting Research Instruments in Science Education

Why, When, Who, What, How, and Where for Trainees Writing Literature Review Articles

The Impact of Peer Assessment on Academic Performance: A Meta-analysis of Control Group Studies

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation