Advances in Health Sciences Education

, Volume 10, Issue 2, pp 133–143 | Cite as

The Effects of Violating Standard Item Writing Principles on Tests and Students: The Consequences of Using Flawed Test Items on Achievement Examinations in Medical Education

  • Steven M. DowningEmail author


The purpose of this research was to study the effects of violations of standard multiple-choice item writing principles on test characteristics, student scores, and pass–fail outcomes. Four basic science examinations, administered to year-one and year-two medical students, were randomly selected for study. Test items were classified as either standard or flawed by three independent raters, blinded to all item performance data. Flawed test questions violated one or more standard principles of effective item writing. Thirty-six to sixty-five percent of the items on the four tests were flawed. Flawed items were 0–15 percentage points more difficult than standard items measuring the same construct. Over all four examinations, 646 (53%) students passed the standard items while 575 (47%) passed the flawed items. The median passing rate difference between flawed and standard items was 3.5 percentage points, but ranged from −1 to 35 percentage points. Item flaws had little effect on test score reliability or other psychometric quality indices. Results showed that flawed multiple-choice test items, which violate well established and evidence-based principles of effective item writing, disadvantage some medical students. Item flaws introduce the systematic error of construct-irrelevant variance to assessments, thereby reducing the validity evidence for examinations and penalizing some examinees.


achievement testing in medical education construct-irrelevant variance (CIV) flawed test items item difficulty effects from flawed items item writing principles multiple-choice questions (MCQs) pass–fail effects from flawed items standard test items written tests 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Albanese, M. 1993Type K and other complex multiple-choice items: An analysis of research and item propertiesEducational Measurement: Issues and Practices122833Google Scholar
  2. Case S.M. Downing S.M. (1989). Performance of various multiple-choice item types on medical specialty examinations: Types A,B,C, K, and X. Proceedings of the Twenty-Eighth Annual Conference on Research in Medical Education, pp. 167–172Google Scholar
  3. CaseS.M. Swanson, D.B. 1998Constructing Written Test Questions for the Basic and Clinical SciencesNational Board of Medical ExaminersPhiladelphia, PAGoogle Scholar
  4. Crehan, K.D., Haladyna, T.M. 1991The validity of two item-writing rulesJournal of Experimental Education59183192Google Scholar
  5. Dawson-Saunders B., Nungester R.J. Downing S.M. (1989). A comparison of single best answer multiple-choice items (A-type) and complex multiple-choice items (K-type). Proceedings of the Twenty-Eighth Annual Conference on Research in Medical Education, pp. 161–166Google Scholar
  6. Downing, S.M. 2002Construct-irrelevant variance and flawed test questions: Do multiple-choice item writing principles make any difference?Academic Medicine77s103104PubMedGoogle Scholar
  7. Downing, S.M., Baranowski, R.A., Grosso, L.J., Norcini, J.J. 1995Item type and cognitive ability measured: The validity evidence for multiple true–false items in medical specialty certificationApplied Measurement in Education889199Google Scholar
  8. Downing S.M., Dawson-Saunders B., Case S.M. Powell R.D. (April, 1991). The psychometric effects of negative stems, unfocused questions, and heterogeneous options on NBME Part I and Part II characteristics. A paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, ILGoogle Scholar
  9. Frary, R.B. 1991The none-of-the-above option: An empirical studyApplied Measurement in Education4115124Google Scholar
  10. Haladyna, T.M. 2004Developing and Validating Multiple-choice Test ItemsLawrence Erlbaum AssociatesHillsdale, NJGoogle Scholar
  11. Haladyna, T.M., DowningS.M. Rodriguez, M.C. 2002A review of multiple-choice item-writing guidelinesApplied Measurement in Education15309334CrossRefGoogle Scholar
  12. Harasym, P.H., Leong, E.J., Violato, C., Brant, R., Lorscheider, F.F. 1998Cuing effect of “all of the above” on the reliability and validity of multiple-choice test itemsEvaluation and the Health Profession21120133Google Scholar
  13. Jozefowicz, R.F., Koeppen, B.M., Case, S., Galbraith, R., Swanson, D., Glew, H. 2002The quality of in-house medical school examinationsAcademic Medicine77156161PubMedGoogle Scholar
  14. Linn, R.L., Gronlund, N.E. 2000Measurement and Assessment in Teaching8Prentice-HallUpper Saddle River, NJGoogle Scholar
  15. Mehrens, W.A., Lehmann, I.J. 1991Measurement and Evaluation in Education and PsychologyHarcourt BraceNew YorkGoogle Scholar
  16. Messick, S. 1989ValidityLinn, R.L. eds. Educational Measurement3American Council on Education and MacmillanNew York13104Google Scholar
  17. Nedelsky, L. 1954Absolute grading standards for objective testsEducational and Psychological Measurement14181201Google Scholar
  18. Nitko, A.J. 1996Educational Assessment of StudentsMerrillEnglewood Cliffs, NJGoogle Scholar
  19. Tamir, P. 1993Positive and negative multiple choice items: How difficult are they?Studies in Educational Evaluation1931132Google Scholar

Copyright information

© Springer 2005

Authors and Affiliations

  1. 1.Department of Medical Education (MC 591), College of MedicineUniversity of Illinois at ChicagoChicagoUSA

Personalised recommendations