Advances in Health Sciences Education

, Volume 7, Issue 3, pp 235–241 | Cite as

Threats to the Validity of Locally Developed Multiple-Choice Tests in Medical Education: Construct-Irrelevant Variance and Construct Underrepresentation

  • Steven M. Downing


Construct-irrelevant variance (CIV) - the erroneous inflation or deflation of test scores due to certain types of uncontrolled or systematic measurement error - and construct underrepresentation (CUR) - the under-sampling of the achievement domain - are discussed as threats to the meaningful interpretation of scores from objective tests developed for local medical education use. Several sources of CIV and CUR are discussed and remedies are suggested. Test score inflation or deflation, due to the systematic measurement error introduced by CIV, may result from poorly crafted test questions, insecure test questions and other types of test irregularities, testwiseness, guessing, and test item bias. Using indefensible passing standards can interact with test scores to produce CIV. Sources of content under representation are associated with tests that are too short to support legitimate inferences to the domain and which are composed of trivial questions written at low-levels of the cognitive domain. ``Teaching to the test'' is another frequent contributor to CUR in examinations used in medical education. Most sources of CIV and CUR can be controlled or eliminated from the tests used at all levels of medical education, given proper training and support of the faculty who create these important examinations.

construct-irrelevant variance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. American Educational Research Association, American Psychological Association, National Council on Measurement in Education (1999). Standards for Educational and Psychological Testing. Washington: American Educational Research Association.Google Scholar
  2. Anastasi, A. (1988). Psychological Testing. New York: Macmillan.Google Scholar
  3. Case, S.M. & Swanson, D.E. (1998). Constructing Written Test Questions for the Basic And Clinical Sciences. <>. Accessed 3/28/02 National Board of Medical Examiners, Philadelphia.Google Scholar
  4. Cole, N.S. & Moss, P.A. (1989). Bias in test use. In R.L. Linn (ed.), Educational Measurement (pp. 201-219). New York: American Council on Education and Macmillan.Google Scholar
  5. Cook, T.D. & Campbell, D.T. (1979). Quasi-experimentation: Design and Analysis Issues for Field Settings. Chicago: Rand McNally.Google Scholar
  6. Downing, S.M. (2002). Assessment of knowledge with written test forms. In G.R. Norman, C.P.M. Van der Vleuten & D.I. Newble (eds.), International Handbook for Research in Medical Education (pp. 647-672). Dordrecht, The Netherlands: Kluwer Academic Publications.Google Scholar
  7. Haladyna, T.M. (1999). Developing and Validating Multiple-choice Test Items. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  8. Haladyna, T.M., Downing, S.M. & Rodriguez, S.M. (2002). A review of multiple-choice item-writing guidelines. Applied Measurement in Education 15(3), 309-333.CrossRefGoogle Scholar
  9. Jozefowicz, R.F., Koeppen, B.M. et al. (2002). The quality of in-house medical school examinations. Acad. Med. 77: 156-161.CrossRefGoogle Scholar
  10. Messick, S. (1989). Validity. In R.L. Linn (ed.), Educational Measurement (pp. 13-104). New York: American Council on Education and Macmillan.Google Scholar
  11. Norcini, J.J. & Shea, J.A. (1997). The credibility and comparability of standards. Applied Measurement in Education 10(1): 39-59.CrossRefGoogle Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • Steven M. Downing
    • 1
  1. 1.Department of Medical Education (MC 591)University of Illinois at Chicago, College of MedicineChicagoUSA

Personalised recommendations