Advances in Health Sciences Education

, Volume 18, Issue 5, pp 945–961 | Cite as

Multiple choice questions can be designed or revised to challenge learners’ critical thinking

  • Rochelle E. TractenbergEmail author
  • Matthew M. Gushta
  • Susan E. Mulroney
  • Peggy A. Weissinger


Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging—perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item’s difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.


Cognitive complexity Higher order thinking Multiple-choice test items Assessment 



This work was supported by a Curricular Innovation, Research, and Creativity in Learning Environment (CIRCLE) Grant (intramural (GUMC)) to RET.

Conflict of interest

No declarations of interest to report for any co-author.


  1. American Psychological Association, National Council on Measurement in Education, American Educational Research Association. (1999). Standards for educational and psychological testing, 2E. Washington, DC: American Educational Research Association.Google Scholar
  2. Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., et al. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Longman.Google Scholar
  3. Anderson, J. R. (2005). Cognitive psychology and its implications, 6E. New York, NY: Worth Publishers.Google Scholar
  4. Bloom, B. J., Englehart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals, by a committee of college and university examiners. Handbook I: Cognitive domain. New York: David McKay.Google Scholar
  5. Bond, T. G., & Fox, C. M. (2007). Applying the Rasch Model: Fundamental measurement in the human sciences, 2E. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
  6. Bruff, D. (2009). Teaching with classroom response systems: Creating active learning environments. San Francisco, CA: Jossey Bass.Google Scholar
  7. Buckles, S., & Siegfried, J. J. (2006). Using multiple-choice questions to evaluate in-depth learning of economics. The Journal of Economic Education, 37(1), 48–57.CrossRefGoogle Scholar
  8. Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and clinical sciences, 3E-Revised. Philadelphia: National Board of Medical Examiners.Google Scholar
  9. Cizek, G. J., & Bunch, M. B. (2008). Standard setting: A guide to establishing and evaluating performance standards on tests. Newbury Park, CA: Sage Publications.Google Scholar
  10. Crocker, L., & Algina, J. (1986). Introduction to classical & modern test theory. Belmont, CA: Wadsworth Group.Google Scholar
  11. Custers, E. J. F. M., & Boshuizen, H. P. A. (2002). The psychology of learning. In G. R. Norman, C. P. M. van der Vleuten, & D. L. Newble (Eds.), International handbook of research in medical education (Vol. 1, pp. 163–203). Dordrecht: Kluwer.CrossRefGoogle Scholar
  12. Dimitrov, D. (2007). Least squares distance method of cognitive validation and analysis for binary items using their item response theory parameters. Applied Psychological Measurement, 31, 367–387.CrossRefGoogle Scholar
  13. Downing, S. M. (2002). Assessment of knowledge with written test forms. In G. R. Norman, C. P. M. van der Vleuten, & D. L. Newble (Eds.), International handbook of research in medical education (Vol. 2, pp. 647–672). Dordrecht: Kluwer.CrossRefGoogle Scholar
  14. Ericcson, K. A. (2004). Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Academic Medicine, 9(10 suppl), S70–S81.CrossRefGoogle Scholar
  15. Gierl, M. J., Leighton, J. P., & Hunka, S. M. (2000). Exploring the logic of Tatsuoka’s rule-space model for test development and analysis. An NCME instructional module. Educational Measurement: Issues and Practice, 19(3), 34–44.CrossRefGoogle Scholar
  16. Gruppen, L. D., & Frohna, A. Z. (2002). Clinical Reasoning. In G. R. Norman, C. P. M. van der Vleuten, & D. L. Newble (Eds.), International handbook of research in medical education (Vol. 1, pp. 205–230). Dordrecht: Kluwer.CrossRefGoogle Scholar
  17. Gushta, M. M., Yumoto, F., & Williams, A. (2009). Separating item difficulty and cognitive complexity in educational achievement testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.Google Scholar
  18. Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Needham Heights, MA: Allyn & Bacon.Google Scholar
  19. Linacre, J. M. (2007). A User’s guide to WINSTEPS® Rasch-model computer program. Chicago, IL: Author. Downloaded 10 October 2007 from
  20. Mislevy, R. J., & Huang, C.-W. (2007). Measurement models as narrative structures. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions & applications (pp. 16–35). New York: Springer.Google Scholar
  21. Moseley, D., Baumfield, V., Elliott, J., Gregson, M., Higgins, S., Miller, J., et al. (2005). Frameworks for thinking. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
  22. Rupp, A. A., & Mislevy, R. J. (2007). Cognitive foundations of structured item response models. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment: Theories and applications (pp. 205–241). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  23. Shelton, S. W. (1999). The effect of experience on the use of irrelevant evidence in auditor judgment. The Accounting Review, 74(2), 217–224.CrossRefGoogle Scholar
  24. Smith, R. M., Schumacker, R. E., & Bush, J. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2, 66–78.Google Scholar
  25. Tardieua, H., Ehrlicha, M.-F., & Gyselincka, V. (1992). Levels of representation and domain-specific knowledge in comprehension of scientific texts. Language and Cognitive Processes, 7(3–4), 335–351. doi: 10.1080/01690969208409390.CrossRefGoogle Scholar
  26. Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20(4), 345–354.CrossRefGoogle Scholar
  27. van de Watering, G., & van der Rijt, J. (2006). Teachers’ and students’ perceptions of assessments: A review and a study into the ability and accuracy of estimating the difficulty levels of assessment items. Educational Research Review, 1(2), 133–147.CrossRefGoogle Scholar
  28. van Hoeij, M. J. W., Haarhuis, J. C. M., Wierstra, R. F. A., & van Beukelen, P. (2004). Developing a classification tool based on Bloom’s Taxonomy to assess the cognitive level of short essay questions. Journal of Veterinary Medical Education, 31(3), 261–267.CrossRefGoogle Scholar
  29. Williams, R. D., & Haladyna, T. M. (1982). Logical operations for generating intended questions (LOGIQ): A typology for higher level test items. In G. H. Roid & T. M. Haladyna (Eds.), A technology for test-item writing (pp. 161–186). New York: Academic Press.Google Scholar
  30. Zheng, A. Y., Lawhorn, J. K., Lumley, T., & Freeman, S. (2008). Application of Bloom’s taxonomy debunks the “MCAT Myth”. Science, 319, 414–455. doi: 10.1126/science.1147852.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2012

Authors and Affiliations

  • Rochelle E. Tractenberg
    • 1
    • 2
    • 3
    Email author
  • Matthew M. Gushta
    • 4
  • Susan E. Mulroney
    • 5
  • Peggy A. Weissinger
    • 6
  1. 1.Collaborative for Research on Outcomes and -Metrics and Departments of Neurology, Biostatistics, Bioinformatics & Biomathematics, and PsychiatryGeorgetown University Medical CenterWashingtonUSA
  2. 2.Department of Biostatistics, Bioinformatics and BiomathematicsGeorgetown University Medical CenterWashingtonUSA
  3. 3.Department of PsychiatryGeorgetown University Medical CenterWashingtonUSA
  4. 4.Wireless GenerationWashingtonUSA
  5. 5.Department of Pharmacology & PhysiologyGeorgetown University Medical CenterWashingtonUSA
  6. 6.School of MedicineGeorgetown UniversityWashingtonUSA

Personalised recommendations