Skip to main content
Log in

Much ado about differences: why expert-novice comparisons add little to the validity argument

  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

One approach to validating assessment scores involves evaluating the ability of scores to discriminate among groups who differ in a specific characteristic, such as training status (in education) or disease state (in clinical applications). Such known-groups comparison studies provide validity evidence of “relationships with other variables.” The typical education research study might compare scores between staff physicians and postgraduate trainees with the hypothesis that those with more advanced training (the “experts”) will have higher scores than those less advanced (the “novices”). However, such comparisons are too nonspecific to support clear conclusions, and expert-novice comparisons (and known-groups comparisons in general) thus contribute little to the validity argument. The major flaw is the problem of confounding: there are multiple plausible explanations for any observed between-group differences. The absence of hypothesized differences would suggest a serious flaw in the validity argument, but the confirmation of such differences adds little. As such, accurate known-groups discrimination may be necessary, but will never be sufficient, to support the validity of scores. This article elaborates on this and other problems with the known-groups comparison that limit its utility as a source of validity evidence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. I credit Geoff Norman (personal communication) for the concept of the "grey hair index."

References

  • Albanese, M., Mejicano, G., & Gruppen, L. (2008). Perspective: Competency-based medical education: a defense against the four horsemen of the medical education apocalypse. Academic Medicine, 83, 1132–1139.

    Article  Google Scholar 

  • American Educational Research Association, American Psychological Association, National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.

    Google Scholar 

  • Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56, 81–105.

    Article  Google Scholar 

  • Cook, D. A. (2014). When I say… Validity. Medical Education, In press.

  • Cook, D. A., & Beckman, T. J. (2006). Current Concepts in Validity and Reliability for Psychometric Instruments: Theory and Application. American Journal of Medicine, 119, 166.e7–16.

  • Cook, D. A., Brydges, R., Zendejas, B., Hamstra, S. J., & Hatala, R. (2013). Technology-enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods, and reporting quality. Academic Medicine, 88, 872–883.

  • Cook, D. A., & West, C. P. (2013). Reconsidering the focus on “outcomes research” in medical education: a cautionary note. Academic Medicine, 88, 162–167.

    Article  Google Scholar 

  • Cook, D. A., Zendejas B., Hamstra S. J., Hatala R, Brydges R. (2014) What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education, 19(2), 233–250.

  • Downing, S. M. (2003). Validity: on the meaningful interpretation of assessment data. Medical Education, 37, 830–837.

    Article  Google Scholar 

  • Hodges, B., Regehr, G., McNaughton, N., Tiberius, R., & Hanson, M. (1999). OSCE checklists do not capture increasing levels of expertise. Academic Medicine, 74(10), 1129–1134.

    Article  Google Scholar 

  • Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 17–64). Praeger: Westport.

    Google Scholar 

  • Lijmer, J. G., Mol, B. W., Heisterkamp, S., Bonsel, G. J., Prins, M. H., van der Meulen, J. H. P., et al. (1999). Empirical evidence of design-related bias in studies of diagnostic tests. JAMA: The Journal of the American Medical Association, 282, 1061–1066.

    Article  Google Scholar 

  • Neufeld, V. R., Norman, G. R., Feightner, J. W., & Barrows, H. S. (1981). Clinical problem-solving by medical students: a cross-sectional and longitudinal analysis. Medical Education, 15(5), 315–322.

    Article  Google Scholar 

  • Prystowsky, J. B., & Bordage, G. (2001). An outcomes research perspective on medical education: the predominance of trainee assessment and satisfaction. Medical Education, 35, 331–336.

    Article  Google Scholar 

  • Weinberger, S. E., Pereira, A. G., Iobst, W. F., Mechaber, A. J., & Bronze, M. S. (2010). Competency-based education and training in internal medicine. Annals of Internal Medicine, 153, 751–756.

    Article  Google Scholar 

  • Whiting, P. F., Rutjes, A. W. S., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., et al. (2011). QUADAS-2: A revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine, 155, 529–536.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David A. Cook.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cook, D.A. Much ado about differences: why expert-novice comparisons add little to the validity argument. Adv in Health Sci Educ 20, 829–834 (2015). https://doi.org/10.1007/s10459-014-9551-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-014-9551-3

Keywords

Navigation