Skip to main content
Log in

Psychometrics and its discontents: an historical perspective on the discourse of the measurement tradition

  • Reflections
  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

Psychometrics has recently undergone extensive criticism within the medical education literature. The use of quantitative measurement using psychometric instruments such as response scales is thought to emphasize a narrow range of relevant learner skills and competencies. Recent reviews and commentaries suggest that a paradigm shift might be presently underway. We argue for caution, in that the psychometrics approach and the quantitative account of competencies that it reflects is based on a rich discussion regarding measurement and scaling that led to the establishment of this paradigm. Rather than reflecting a homogeneous discipline focused on core competencies devoid of consideration of context, the psychometric community has a history of discourse and debate within the field, with an acknowledgement that the techniques and instruments developed within psychometrics are heuristics that must be used pragmatically.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bessel, F. W. (1823). Astronomische Beobachtungen auf der Koniglichen UniversitätsSternwarte in Konigsberg, vol 8, pp. III–VIII.

  • Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.

    Article  Google Scholar 

  • Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–380.

    Article  Google Scholar 

  • Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: Theory and application. The American Journal of Medicine, 119, e7–e16.

    Article  Google Scholar 

  • Cook, D. A., Zendejas, B., Hamstra, S. J., Hatala, R., & Brydges, R. (2014). What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education, 19(2), 233–250.

    Article  Google Scholar 

  • Coombs, C. H. (1953). Theory and methods of social measurement. In L. Festinger & D. Katz (Eds.), Research methods in the behavioral sciences. New York: Holt, Rinehart, & Winston.

    Google Scholar 

  • Coombs, C. H. (1960). A theory of data. Psychological Review, 67, 143–159.

    Article  Google Scholar 

  • Coombs, C. H. (1964). A theory of data. New York: Wiley.

    Google Scholar 

  • Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 30, 116–127.

    Article  Google Scholar 

  • Dudek, N., Marks, M., & Regehr, G. (2005). Failing to fail: The perspectives of clinical supervisors. Academic Medicine, 80(10 Suppl.), S84–S87.

    Article  Google Scholar 

  • Frank, J. R., Snell, L. S., Cate, O. T., Holmboe, E. S., Carraccio, C., Swing, S. R., et al. (2010). Competency-based medical education: Theory to practice. Medical Teacher, 32(8), 638–645.

    Article  Google Scholar 

  • Gigerenzer, G., & Sturm, T. (2005). Tools = theories = data? On some circular dynamics in cognitive science. In M. G. Ash & T. Sturm (Eds.), Psychology’s territories: Historical and contemporary perspectives from different disciplines (pp. 305–342). London: Lawrence Erlbaum Associates.

    Google Scholar 

  • Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Krüger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Gingerich, A., Regehr, G., & Eva, K. W. (2011). Rater-based assessments as social judgments: Rethinking the etiology of rater errors. Academic Medicine, 86, S1–S7.

    Article  Google Scholar 

  • Ginsburg, S., Regehr, G., Hatala, R., McNaughton, N., Frohna, A., Hodges, B., et al. (2000). Context, conflict, and resolution: A new conceptual framework for evaluating professionalism. Academic Medicine, 75, S6–S11.

    Article  Google Scholar 

  • Gofton, W., Dudek, N. L., Wood, T. J., Balaa, F., & Hamstra, S. J. (2012). The Ottawa Surgical Competency Operating Room Evaluation (O-SCORE): A tool to assess surgical competence. Academic Medicine, 87, 1401–1407.

    Article  Google Scholar 

  • Gregory, R. J. (1992). Psychological testing: History, principles, and applications. Needham Heights: Allyn & Bacon.

    Google Scholar 

  • Guinote, A. (2013). Social power and cognition. In D. E. Carlston (Ed.), The Oxford Handbook of Social Cognition. New York: Oxford University Press.

  • Hamstra, S. J. (2014). Designing and selecting assessment instruments: Focusing on competencies. In G. Bandiera & D. Dath (Eds.), The royal college program directors handbook: A practical guide for leading an exceptional program. Royal College of Physicians and Surgeons of Canada: Ottawa, ON.

    Google Scholar 

  • Hodges, B. (2013). Assessment in the post-psychometric era: Learning to love the subjective and collective. Medical Teacher, 35, 564–568.

    Article  Google Scholar 

  • Hoffmann, C. (2007). Constant differences: Friedrich Wilhelm Bessel, the concept of the observer in early Nineteenth-Century practical astronomy and the history of the personal equation. The British Journal for the History of Science, 40, 333–365.

    Article  Google Scholar 

  • Hölder, O. (1901). Die Axiome der Quantitat und die Lehre vom Mass. Berichte iiber die Verbandlungen der Koniglich Sacbsischen Gesellschaft der Wissenschajten pu Leippig, Mathematisch-Physische Klasse, 53, 146.

    Google Scholar 

  • Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.

    Article  Google Scholar 

  • Kim, J., Neilipovitz, D., Cardinal, P., & Chiu, M. (2009). A comparison of global rating scale and checklist scores in the validation of an evaluation tool to assess performance in the resuscitation of critically ill patients during simulated emergencies. Simulation in Healthcare, 4, 6–16.

    Article  Google Scholar 

  • Kline, P. (2000). The Handbook of Psychological Testing. London: Routledge.

    Google Scholar 

  • Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement (Vol. 1). New York: Academic Press.

    Google Scholar 

  • Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Lakatos, I. (1970). Criticism and the growth of knowledge. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Laudan, L. (1984). Science and values. Los Angeles: University of California Press.

    Google Scholar 

  • Lingard, L. (2012). Rethinking competence in the context of teamwork. In B. D. Hodges & L. Lingard (Eds.), The question of competence: Reconsidering medical education in the twenty-first century (pp. 131–154). Ithaca, NY: Cornell University Press.

    Google Scholar 

  • Luce, R. D., & Krumhansl, C. L. (1988). Measurement, scaling and psychophysics. In R. C. Atkinson, R. J. Herrenstein, G. Lindzey, & R. D. Luce (Eds.), Stevens’ handbook of experimental psychology (pp. 3–74). New York: Wiley.

    Google Scholar 

  • Martin, J. A., Regehr, G., Reznick, R., Macrae, H., Murnaghan, J., Hutchinson, C., & Brown, M. (1997). Objective structured assessment of technical skill (OSATS) for surgical residents. British Journal of Surgery, 84, 273–278.

    Article  Google Scholar 

  • Marx, M. H. (1963). The general nature of theory construction. In M. H. Marx (Ed.), Theories of contemporary psychology (pp. 3–46). London: MacMillan.

    Google Scholar 

  • Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18, 5–11.

    Article  Google Scholar 

  • Messick, S. (1995). Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.

    Article  Google Scholar 

  • Michell, J. (1986). Measurement scales and statistics: A clash of paradigms. Psychological Bulletin, 100, 398–407.

    Article  Google Scholar 

  • Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355–383.

    Article  Google Scholar 

  • Nasca, T. J., Philibert, I., Brigham, T., & Flynn, T. C. (2012). The next GME accreditation system: Rationale and benefits. New England Journal of Medicine, 366(11), 1051–1056.

    Article  Google Scholar 

  • Norman, G. R. (2002). Research in medical education: Three decades of progress. British Medical Journal, 324, 1560–1562.

    Article  Google Scholar 

  • Popper, K. R. (1959). The logic of scientific discovery. London: Hutchinson.

    Google Scholar 

  • Regehr, G., Bogo, M., Regehr, C., & Power, R. (2007). Can we build a better mousetrap? Improving the measures of practice performance in the field practicum. Journal of Social Work Education, 43, 327–343.

    Article  Google Scholar 

  • Robertson, I. (2012). The winner effect: How power affects your brain. London: Bloomsbury.

    Google Scholar 

  • Schaffer, S. (1988). Astronomers mark time: Discipline and the personal equation. Science in Context, 2, 115–145.

    Article  Google Scholar 

  • Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2005). Assessing professional competence: From methods to programmes. Medical Education, 39, 309–317.

    Article  Google Scholar 

  • Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2006). Challenges for educationalists. British Medical Journal, 333(7567), 544–546.

    Article  Google Scholar 

  • Sherbino, J., Kulasegaram, M., Worster, A., & Norman, G. (2013). The reliability of encounter cards to assess the CanMEDS roles. Advances in Health Sciences Education, 18, 987–996.

    Article  Google Scholar 

  • Sherif, M. (1958). Superordinate goals in the reduction of intergroup conflict. American Journal of Sociology, 63, 349–356.

    Article  Google Scholar 

  • Speer, A. J., Solomon, D. J., & Fincher, R. M. (2000). Grade inflation in internal medicine clerkships: Results of a national survey. Teaching and Learning in Medicine, 12, 112–116.

    Article  Google Scholar 

  • Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680.

    Article  Google Scholar 

  • Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of experimental psychology (pp. 21–29). New York: John Wiley.

    Google Scholar 

  • Traub, R. (1997). Classical test theory in historical perspective. Educational Measurement: Issues and Practice, 16, 8–14.

    Article  Google Scholar 

  • Velleman, P. F., & Wilkinson, L. (1993). Nominal, ordinal, interval, and ratio typologies are misleading. The American Statistician, 47, 65–72.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stanley J. Hamstra.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schoenherr, J.R., Hamstra, S.J. Psychometrics and its discontents: an historical perspective on the discourse of the measurement tradition. Adv in Health Sci Educ 21, 719–729 (2016). https://doi.org/10.1007/s10459-015-9623-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-015-9623-z

Keywords

Navigation