Abstract
Psychometrics has recently undergone extensive criticism within the medical education literature. The use of quantitative measurement using psychometric instruments such as response scales is thought to emphasize a narrow range of relevant learner skills and competencies. Recent reviews and commentaries suggest that a paradigm shift might be presently underway. We argue for caution, in that the psychometrics approach and the quantitative account of competencies that it reflects is based on a rich discussion regarding measurement and scaling that led to the establishment of this paradigm. Rather than reflecting a homogeneous discipline focused on core competencies devoid of consideration of context, the psychometric community has a history of discourse and debate within the field, with an acknowledgement that the techniques and instruments developed within psychometrics are heuristics that must be used pragmatically.
Similar content being viewed by others
References
Bessel, F. W. (1823). Astronomische Beobachtungen auf der Koniglichen Universitäts—Sternwarte in Konigsberg, vol 8, pp. III–VIII.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.
Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–380.
Cook, D. A., & Beckman, T. J. (2006). Current concepts in validity and reliability for psychometric instruments: Theory and application. The American Journal of Medicine, 119, e7–e16.
Cook, D. A., Zendejas, B., Hamstra, S. J., Hatala, R., & Brydges, R. (2014). What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education, 19(2), 233–250.
Coombs, C. H. (1953). Theory and methods of social measurement. In L. Festinger & D. Katz (Eds.), Research methods in the behavioral sciences. New York: Holt, Rinehart, & Winston.
Coombs, C. H. (1960). A theory of data. Psychological Review, 67, 143–159.
Coombs, C. H. (1964). A theory of data. New York: Wiley.
Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 30, 116–127.
Dudek, N., Marks, M., & Regehr, G. (2005). Failing to fail: The perspectives of clinical supervisors. Academic Medicine, 80(10 Suppl.), S84–S87.
Frank, J. R., Snell, L. S., Cate, O. T., Holmboe, E. S., Carraccio, C., Swing, S. R., et al. (2010). Competency-based medical education: Theory to practice. Medical Teacher, 32(8), 638–645.
Gigerenzer, G., & Sturm, T. (2005). Tools = theories = data? On some circular dynamics in cognitive science. In M. G. Ash & T. Sturm (Eds.), Psychology’s territories: Historical and contemporary perspectives from different disciplines (pp. 305–342). London: Lawrence Erlbaum Associates.
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L., Beatty, J., & Krüger, L. (1989). The empire of chance: How probability changed science and everyday life. Cambridge: Cambridge University Press.
Gingerich, A., Regehr, G., & Eva, K. W. (2011). Rater-based assessments as social judgments: Rethinking the etiology of rater errors. Academic Medicine, 86, S1–S7.
Ginsburg, S., Regehr, G., Hatala, R., McNaughton, N., Frohna, A., Hodges, B., et al. (2000). Context, conflict, and resolution: A new conceptual framework for evaluating professionalism. Academic Medicine, 75, S6–S11.
Gofton, W., Dudek, N. L., Wood, T. J., Balaa, F., & Hamstra, S. J. (2012). The Ottawa Surgical Competency Operating Room Evaluation (O-SCORE): A tool to assess surgical competence. Academic Medicine, 87, 1401–1407.
Gregory, R. J. (1992). Psychological testing: History, principles, and applications. Needham Heights: Allyn & Bacon.
Guinote, A. (2013). Social power and cognition. In D. E. Carlston (Ed.), The Oxford Handbook of Social Cognition. New York: Oxford University Press.
Hamstra, S. J. (2014). Designing and selecting assessment instruments: Focusing on competencies. In G. Bandiera & D. Dath (Eds.), The royal college program directors handbook: A practical guide for leading an exceptional program. Royal College of Physicians and Surgeons of Canada: Ottawa, ON.
Hodges, B. (2013). Assessment in the post-psychometric era: Learning to love the subjective and collective. Medical Teacher, 35, 564–568.
Hoffmann, C. (2007). Constant differences: Friedrich Wilhelm Bessel, the concept of the observer in early Nineteenth-Century practical astronomy and the history of the personal equation. The British Journal for the History of Science, 40, 333–365.
Hölder, O. (1901). Die Axiome der Quantitat und die Lehre vom Mass. Berichte iiber die Verbandlungen der Koniglich Sacbsischen Gesellschaft der Wissenschajten pu Leippig, Mathematisch-Physische Klasse, 53, 146.
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Kim, J., Neilipovitz, D., Cardinal, P., & Chiu, M. (2009). A comparison of global rating scale and checklist scores in the validation of an evaluation tool to assess performance in the resuscitation of critically ill patients during simulated emergencies. Simulation in Healthcare, 4, 6–16.
Kline, P. (2000). The Handbook of Psychological Testing. London: Routledge.
Krantz, D. H., Luce, R. D., Suppes, P., & Tversky, A. (1971). Foundations of measurement (Vol. 1). New York: Academic Press.
Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago, IL: University of Chicago Press.
Lakatos, I. (1970). Criticism and the growth of knowledge. Cambridge: Cambridge University Press.
Laudan, L. (1984). Science and values. Los Angeles: University of California Press.
Lingard, L. (2012). Rethinking competence in the context of teamwork. In B. D. Hodges & L. Lingard (Eds.), The question of competence: Reconsidering medical education in the twenty-first century (pp. 131–154). Ithaca, NY: Cornell University Press.
Luce, R. D., & Krumhansl, C. L. (1988). Measurement, scaling and psychophysics. In R. C. Atkinson, R. J. Herrenstein, G. Lindzey, & R. D. Luce (Eds.), Stevens’ handbook of experimental psychology (pp. 3–74). New York: Wiley.
Martin, J. A., Regehr, G., Reznick, R., Macrae, H., Murnaghan, J., Hutchinson, C., & Brown, M. (1997). Objective structured assessment of technical skill (OSATS) for surgical residents. British Journal of Surgery, 84, 273–278.
Marx, M. H. (1963). The general nature of theory construction. In M. H. Marx (Ed.), Theories of contemporary psychology (pp. 3–46). London: MacMillan.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational Researcher, 18, 5–11.
Messick, S. (1995). Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.
Michell, J. (1986). Measurement scales and statistics: A clash of paradigms. Psychological Bulletin, 100, 398–407.
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355–383.
Nasca, T. J., Philibert, I., Brigham, T., & Flynn, T. C. (2012). The next GME accreditation system: Rationale and benefits. New England Journal of Medicine, 366(11), 1051–1056.
Norman, G. R. (2002). Research in medical education: Three decades of progress. British Medical Journal, 324, 1560–1562.
Popper, K. R. (1959). The logic of scientific discovery. London: Hutchinson.
Regehr, G., Bogo, M., Regehr, C., & Power, R. (2007). Can we build a better mousetrap? Improving the measures of practice performance in the field practicum. Journal of Social Work Education, 43, 327–343.
Robertson, I. (2012). The winner effect: How power affects your brain. London: Bloomsbury.
Schaffer, S. (1988). Astronomers mark time: Discipline and the personal equation. Science in Context, 2, 115–145.
Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2005). Assessing professional competence: From methods to programmes. Medical Education, 39, 309–317.
Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2006). Challenges for educationalists. British Medical Journal, 333(7567), 544–546.
Sherbino, J., Kulasegaram, M., Worster, A., & Norman, G. (2013). The reliability of encounter cards to assess the CanMEDS roles. Advances in Health Sciences Education, 18, 987–996.
Sherif, M. (1958). Superordinate goals in the reduction of intergroup conflict. American Journal of Sociology, 63, 349–356.
Speer, A. J., Solomon, D. J., & Fincher, R. M. (2000). Grade inflation in internal medicine clerkships: Results of a national survey. Teaching and Learning in Medicine, 12, 112–116.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680.
Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of experimental psychology (pp. 21–29). New York: John Wiley.
Traub, R. (1997). Classical test theory in historical perspective. Educational Measurement: Issues and Practice, 16, 8–14.
Velleman, P. F., & Wilkinson, L. (1993). Nominal, ordinal, interval, and ratio typologies are misleading. The American Statistician, 47, 65–72.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Schoenherr, J.R., Hamstra, S.J. Psychometrics and its discontents: an historical perspective on the discourse of the measurement tradition. Adv in Health Sci Educ 21, 719–729 (2016). https://doi.org/10.1007/s10459-015-9623-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-015-9623-z