Abstract
One approach to validating assessment scores involves evaluating the ability of scores to discriminate among groups who differ in a specific characteristic, such as training status (in education) or disease state (in clinical applications). Such known-groups comparison studies provide validity evidence of “relationships with other variables.” The typical education research study might compare scores between staff physicians and postgraduate trainees with the hypothesis that those with more advanced training (the “experts”) will have higher scores than those less advanced (the “novices”). However, such comparisons are too nonspecific to support clear conclusions, and expert-novice comparisons (and known-groups comparisons in general) thus contribute little to the validity argument. The major flaw is the problem of confounding: there are multiple plausible explanations for any observed between-group differences. The absence of hypothesized differences would suggest a serious flaw in the validity argument, but the confirmation of such differences adds little. As such, accurate known-groups discrimination may be necessary, but will never be sufficient, to support the validity of scores. This article elaborates on this and other problems with the known-groups comparison that limit its utility as a source of validity evidence.
Similar content being viewed by others
Notes
I credit Geoff Norman (personal communication) for the concept of the "grey hair index."
References
Albanese, M., Mejicano, G., & Gruppen, L. (2008). Perspective: Competency-based medical education: a defense against the four horsemen of the medical education apocalypse. Academic Medicine, 83, 1132–1139.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.
Cook, D. A. (2014). When I say… Validity. Medical Education, In press.
Cook, D. A., & Beckman, T. J. (2006). Current Concepts in Validity and Reliability for Psychometric Instruments: Theory and Application. American Journal of Medicine, 119, 166.e7–16.
Cook, D. A., Brydges, R., Zendejas, B., Hamstra, S. J., & Hatala, R. (2013). Technology-enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods, and reporting quality. Academic Medicine, 88, 872–883.
Cook, D. A., & West, C. P. (2013). Reconsidering the focus on “outcomes research” in medical education: a cautionary note. Academic Medicine, 88, 162–167.
Cook, D. A., Zendejas B., Hamstra S. J., Hatala R, Brydges R. (2014) What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education, 19(2), 233–250.
Downing, S. M. (2003). Validity: on the meaningful interpretation of assessment data. Medical Education, 37, 830–837.
Hodges, B., Regehr, G., McNaughton, N., Tiberius, R., & Hanson, M. (1999). OSCE checklists do not capture increasing levels of expertise. Academic Medicine, 74(10), 1129–1134.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 17–64). Praeger: Westport.
Lijmer, J. G., Mol, B. W., Heisterkamp, S., Bonsel, G. J., Prins, M. H., van der Meulen, J. H. P., et al. (1999). Empirical evidence of design-related bias in studies of diagnostic tests. JAMA: The Journal of the American Medical Association, 282, 1061–1066.
Neufeld, V. R., Norman, G. R., Feightner, J. W., & Barrows, H. S. (1981). Clinical problem-solving by medical students: a cross-sectional and longitudinal analysis. Medical Education, 15(5), 315–322.
Prystowsky, J. B., & Bordage, G. (2001). An outcomes research perspective on medical education: the predominance of trainee assessment and satisfaction. Medical Education, 35, 331–336.
Weinberger, S. E., Pereira, A. G., Iobst, W. F., Mechaber, A. J., & Bronze, M. S. (2010). Competency-based education and training in internal medicine. Annals of Internal Medicine, 153, 751–756.
Whiting, P. F., Rutjes, A. W. S., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., et al. (2011). QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine, 155, 529–536.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cook, D.A. Much ado about differences: why expert-novice comparisons add little to the validity argument. Adv in Health Sci Educ 20, 829–834 (2015). https://doi.org/10.1007/s10459-014-9551-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10459-014-9551-3