Multiple choice questions can be designed or revised to challenge learners’ critical thinking
- 1.5k Downloads
Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging—perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item’s difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.
KeywordsCognitive complexity Higher order thinking Multiple-choice test items Assessment
This work was supported by a Curricular Innovation, Research, and Creativity in Learning Environment (CIRCLE) Grant (intramural (GUMC)) to RET.
Conflict of interest
No declarations of interest to report for any co-author.
- American Psychological Association, National Council on Measurement in Education, American Educational Research Association. (1999). Standards for educational and psychological testing, 2E. Washington, DC: American Educational Research Association.Google Scholar
- Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., et al. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Longman.Google Scholar
- Anderson, J. R. (2005). Cognitive psychology and its implications, 6E. New York, NY: Worth Publishers.Google Scholar
- Bloom, B. J., Englehart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals, by a committee of college and university examiners. Handbook I: Cognitive domain. New York: David McKay.Google Scholar
- Bond, T. G., & Fox, C. M. (2007). Applying the Rasch Model: Fundamental measurement in the human sciences, 2E. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
- Bruff, D. (2009). Teaching with classroom response systems: Creating active learning environments. San Francisco, CA: Jossey Bass.Google Scholar
- Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and clinical sciences, 3E-Revised. Philadelphia: National Board of Medical Examiners.Google Scholar
- Cizek, G. J., & Bunch, M. B. (2008). Standard setting: A guide to establishing and evaluating performance standards on tests. Newbury Park, CA: Sage Publications.Google Scholar
- Crocker, L., & Algina, J. (1986). Introduction to classical & modern test theory. Belmont, CA: Wadsworth Group.Google Scholar
- Gushta, M. M., Yumoto, F., & Williams, A. (2009). Separating item difficulty and cognitive complexity in educational achievement testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.Google Scholar
- Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Needham Heights, MA: Allyn & Bacon.Google Scholar
- Linacre, J. M. (2007). A User’s guide to WINSTEPS® Rasch-model computer program. Chicago, IL: Author. Downloaded 10 October 2007 from http://www.winsteps.com/winsteps.htm.
- Mislevy, R. J., & Huang, C.-W. (2007). Measurement models as narrative structures. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions & applications (pp. 16–35). New York: Springer.Google Scholar
- Smith, R. M., Schumacker, R. E., & Bush, J. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2, 66–78.Google Scholar
- Williams, R. D., & Haladyna, T. M. (1982). Logical operations for generating intended questions (LOGIQ): A typology for higher level test items. In G. H. Roid & T. M. Haladyna (Eds.), A technology for test-item writing (pp. 161–186). New York: Academic Press.Google Scholar