Advances in Health Sciences Education

, Volume 20, Issue 3, pp 829–834 | Cite as

Much ado about differences: why expert-novice comparisons add little to the validity argument

  • David A. Cook


One approach to validating assessment scores involves evaluating the ability of scores to discriminate among groups who differ in a specific characteristic, such as training status (in education) or disease state (in clinical applications). Such known-groups comparison studies provide validity evidence of “relationships with other variables.” The typical education research study might compare scores between staff physicians and postgraduate trainees with the hypothesis that those with more advanced training (the “experts”) will have higher scores than those less advanced (the “novices”). However, such comparisons are too nonspecific to support clear conclusions, and expert-novice comparisons (and known-groups comparisons in general) thus contribute little to the validity argument. The major flaw is the problem of confounding: there are multiple plausible explanations for any observed between-group differences. The absence of hypothesized differences would suggest a serious flaw in the validity argument, but the confirmation of such differences adds little. As such, accurate known-groups discrimination may be necessary, but will never be sufficient, to support the validity of scores. This article elaborates on this and other problems with the known-groups comparison that limit its utility as a source of validity evidence.


Medical education Data interpretation, statistical Validation Studies Data collection Reliability Assessment Evaluation 


  1. Albanese, M., Mejicano, G., & Gruppen, L. (2008). Perspective: Competency-based medical education: a defense against the four horsemen of the medical education apocalypse. Academic Medicine, 83, 1132–1139.CrossRefGoogle Scholar
  2. American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.Google Scholar
  3. Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.CrossRefGoogle Scholar
  4. Cook, D. A. (2014). When I say… Validity. Medical Education, In press.Google Scholar
  5. Cook, D. A., & Beckman, T. J. (2006). Current Concepts in Validity and Reliability for Psychometric Instruments: Theory and Application. American Journal of Medicine, 119, 166.e7–16.Google Scholar
  6. Cook, D. A., Brydges, R., Zendejas, B., Hamstra, S. J., & Hatala, R. (2013). Technology-enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods, and reporting quality. Academic Medicine, 88, 872–883.Google Scholar
  7. Cook, D. A., & West, C. P. (2013). Reconsidering the focus on “outcomes research” in medical education: a cautionary note. Academic Medicine, 88, 162–167.CrossRefGoogle Scholar
  8. Cook, D. A., Zendejas B., Hamstra S. J., Hatala R, Brydges R. (2014) What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education, 19(2), 233–250.Google Scholar
  9. Downing, S. M. (2003). Validity: on the meaningful interpretation of assessment data. Medical Education, 37, 830–837.CrossRefGoogle Scholar
  10. Hodges, B., Regehr, G., McNaughton, N., Tiberius, R., & Hanson, M. (1999). OSCE checklists do not capture increasing levels of expertise. Academic Medicine, 74(10), 1129–1134.CrossRefGoogle Scholar
  11. Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 17–64). Praeger: Westport.Google Scholar
  12. Lijmer, J. G., Mol, B. W., Heisterkamp, S., Bonsel, G. J., Prins, M. H., van der Meulen, J. H. P., et al. (1999). Empirical evidence of design-related bias in studies of diagnostic tests. JAMA: The Journal of the American Medical Association, 282, 1061–1066.CrossRefGoogle Scholar
  13. Neufeld, V. R., Norman, G. R., Feightner, J. W., & Barrows, H. S. (1981). Clinical problem-solving by medical students: a cross-sectional and longitudinal analysis. Medical Education, 15(5), 315–322.CrossRefGoogle Scholar
  14. Prystowsky, J. B., & Bordage, G. (2001). An outcomes research perspective on medical education: the predominance of trainee assessment and satisfaction. Medical Education, 35, 331–336.CrossRefGoogle Scholar
  15. Weinberger, S. E., Pereira, A. G., Iobst, W. F., Mechaber, A. J., & Bronze, M. S. (2010). Competency-based education and training in internal medicine. Annals of Internal Medicine, 153, 751–756.CrossRefGoogle Scholar
  16. Whiting, P. F., Rutjes, A. W. S., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., et al. (2011). QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine, 155, 529–536.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Mayo Clinic Online LearningMayo Clinic College of MedicineRochesterUSA
  2. 2.Division of General Internal MedicineMayo Clinic College of MedicineRochesterUSA

Personalised recommendations