Why assessment in medical education needs a solid foundation in modern test theory
Despite the frequent use of state-of-the-art psychometric models in the field of medical education, there is a growing body of literature that questions their usefulness in the assessment of medical competence. Essentially, a number of authors raised doubt about the appropriateness of psychometric models as a guiding framework to secure and refine current approaches to the assessment of medical competence. In addition, an intriguing phenomenon known as case specificity is specific to the controversy on the use of psychometric models for the assessment of medical competence. Broadly speaking, case specificity is the finding of instability of performances across clinical cases, tasks, or problems. As stability of performances is, generally speaking, a central assumption in psychometric models, case specificity may limit their applicability. This has probably fueled critiques of the field of psychometrics with a substantial amount of potential empirical evidence. This article aimed to explain the fundamental ideas employed in psychometric theory, and how they might be problematic in the context of assessing medical competence. We further aimed to show why and how some critiques do not hold for the field of psychometrics as a whole, but rather only for specific psychometric approaches. Hence, we highlight approaches that, from our perspective, seem to offer promising possibilities when applied in the assessment of medical competence. In conclusion, we advocate for a more differentiated view on psychometric models and their usage.
KeywordsMeasurement Error Assessment Medical competence Post-psychometric era Case specificity Latent variables
- Colliver, J. A., Markwell, S. J., Vu, N. V., & Barrows, H. S. (1990). Case specificity of standardized-patient examinations: Consistency of performance on components of clinical competence within and between cases. Evaluation & the Health Professions, 13, 252–261. doi: 10.1177/016327879001300208.CrossRefGoogle Scholar
- De Champlain, A., MacMillan, M. K., King, A. M., Klass, D. J., & Margolis, M. J. (1999). Assessing the impacts of intra-site and inter-site checklist recording discrepancies on the reliability of scores obtained in a nationally administered standardized patient examination. Academic Medicine, 74(10), S52–S54.CrossRefGoogle Scholar
- Driessen, E., van der Vleuten, C. P. M., Schuwirth, L., van Tartwijk, J., & Vermunt, J. (2005). The use of qualitative research criteria for portfolio assessment as an alternative to reliability evaluation: A case study. Medical Education, 39, 214–220. doi: 10.1111/j.1365-2929.2004.02059.x.CrossRefGoogle Scholar
- Durning, S. J., Artino, A. R., Boulet, J. R., Dorrance, K., van der Vleuten, C. P. M., & Schuwirth, L. (2012). The impact of selected contextual factors on experts’ clinical reasoning performance (does context impact clinical reasoning performance in experts?). Advances in Health Science Education, 17, 65–79. doi: 10.1007/s10459-011-9294-3.CrossRefGoogle Scholar
- Eva, K. (2011). On the relationship between problem-solving skills and professional practice. In C. Kanes (Ed.), Elaborating professionalism (Vol. 5, pp. 17–34, Innovation and change in professional education). Dordrecht: Springer.Google Scholar
- Hertwig, R., Meier, N., Nickel, C., Zimmermann, P.-C., Ackermann, S., Woike, J. K., et al. (2013). Correlates of diagnostic accuracy in patients with nonspecific complaints. Medical Decision Making : An International Journal of the Society for Medical Decision Making, 33, 533–543. doi: 10.1177/0272989X12470975.CrossRefGoogle Scholar
- Jones, P., Smith, R. W., & Talley, D. (2006). Developing test forms for small-scale achievement testing systems. In S. M. Downing & T. Haladyna (Eds.), Handbook of test development (pp. 487–525). New York, NY: L. Erlbaum Associates.Google Scholar
- R Core Team. (2013). R: A language and environment for statistical computing. Vienna, Austria. http://www.R-project.org/.
- Ray, A., & Wu, M. (2003). PISA programme for international student assessment (PISA): PISA 2000 technical report. Paris: OECD Publishing.Google Scholar
- Rutkowski, L., von Davier, M., & Rutkowski, D. (2013). Handbook of International large-scale assessment: Background, technical issues, and methods of data analysis. Boca Raton: Chapman and Hall/CRC.Google Scholar
- von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2006). The statistical procedures used in national assessment of educational progress: Recent developments and future directions. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics (Vol. 26, pp. 1039–1055). Amsterdam: Elsevier.CrossRefGoogle Scholar
- Webb, N. M., Shavelson, R. J., & Haertel, E. H. (2006). Reliability coefficients and generalizability theory. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics: Psychometrics (pp. 81–124, Handbook of Statistics): Elsevier Science.Google Scholar
- Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar