Skip to main content
Log in

Multiple choice questions can be designed or revised to challenge learners’ critical thinking

Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

Multiple choice (MC) questions from a graduate physiology course were evaluated by cognitive-psychology (but not physiology) experts, and analyzed statistically, in order to test the independence of content expertise and cognitive complexity ratings of MC items. Integration of higher order thinking into MC exams is important, but widely known to be challenging—perhaps especially when content experts must think like novices. Expertise in the domain (content) may actually impede the creation of higher-complexity items. Three cognitive psychology experts independently rated cognitive complexity for 252 multiple-choice physiology items using a six-level cognitive complexity matrix that was synthesized from the literature. Rasch modeling estimated item difficulties. The complexity ratings and difficulty estimates were then analyzed together to determine the relative contributions (and independence) of complexity and difficulty to the likelihood of correct answers on each item. Cognitive complexity was found to be statistically independent of difficulty estimates for 88 % of items. Using the complexity matrix, modifications were identified to increase some item complexities by one level, without affecting the item’s difficulty. Cognitive complexity can effectively be rated by non-content experts. The six-level complexity matrix, if applied by faculty peer groups trained in cognitive complexity and without domain-specific expertise, could lead to improvements in the complexity targeted with item writing and revision. Targeting higher order thinking with MC questions can be achieved without changing item difficulties or other test characteristics, but this may be less likely if the content expert is left to assess items within their domain of expertise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • American Psychological Association, National Council on Measurement in Education, American Educational Research Association. (1999). Standards for educational and psychological testing, 2E. Washington, DC: American Educational Research Association.

  • Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., et al. (Eds.). (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational objectives. New York: Longman.

    Google Scholar 

  • Anderson, J. R. (2005). Cognitive psychology and its implications, 6E. New York, NY: Worth Publishers.

  • Bloom, B. J., Englehart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy of educational objectives: The classification of educational goals, by a committee of college and university examiners. Handbook I: Cognitive domain. New York: David McKay.

  • Bond, T. G., & Fox, C. M. (2007). Applying the Rasch Model: Fundamental measurement in the human sciences, 2E. Mahwah, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Bruff, D. (2009). Teaching with classroom response systems: Creating active learning environments. San Francisco, CA: Jossey Bass.

    Google Scholar 

  • Buckles, S., & Siegfried, J. J. (2006). Using multiple-choice questions to evaluate in-depth learning of economics. The Journal of Economic Education, 37(1), 48–57.

    Article  Google Scholar 

  • Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and clinical sciences, 3E-Revised. Philadelphia: National Board of Medical Examiners.

    Google Scholar 

  • Cizek, G. J., & Bunch, M. B. (2008). Standard setting: A guide to establishing and evaluating performance standards on tests. Newbury Park, CA: Sage Publications.

    Google Scholar 

  • Crocker, L., & Algina, J. (1986). Introduction to classical & modern test theory. Belmont, CA: Wadsworth Group.

    Google Scholar 

  • Custers, E. J. F. M., & Boshuizen, H. P. A. (2002). The psychology of learning. In G. R. Norman, C. P. M. van der Vleuten, & D. L. Newble (Eds.), International handbook of research in medical education (Vol. 1, pp. 163–203). Dordrecht: Kluwer.

    Chapter  Google Scholar 

  • Dimitrov, D. (2007). Least squares distance method of cognitive validation and analysis for binary items using their item response theory parameters. Applied Psychological Measurement, 31, 367–387.

    Article  Google Scholar 

  • Downing, S. M. (2002). Assessment of knowledge with written test forms. In G. R. Norman, C. P. M. van der Vleuten, & D. L. Newble (Eds.), International handbook of research in medical education (Vol. 2, pp. 647–672). Dordrecht: Kluwer.

    Chapter  Google Scholar 

  • Ericcson, K. A. (2004). Deliberate practice and the acquisition and maintenance of expert performance in medicine and related domains. Academic Medicine, 9(10 suppl), S70–S81.

    Article  Google Scholar 

  • Gierl, M. J., Leighton, J. P., & Hunka, S. M. (2000). Exploring the logic of Tatsuoka’s rule-space model for test development and analysis. An NCME instructional module. Educational Measurement: Issues and Practice, 19(3), 34–44.

    Article  Google Scholar 

  • Gruppen, L. D., & Frohna, A. Z. (2002). Clinical Reasoning. In G. R. Norman, C. P. M. van der Vleuten, & D. L. Newble (Eds.), International handbook of research in medical education (Vol. 1, pp. 205–230). Dordrecht: Kluwer.

    Chapter  Google Scholar 

  • Gushta, M. M., Yumoto, F., & Williams, A. (2009). Separating item difficulty and cognitive complexity in educational achievement testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.

  • Haladyna, T. M. (1997). Writing test items to evaluate higher order thinking. Needham Heights, MA: Allyn & Bacon.

    Google Scholar 

  • Linacre, J. M. (2007). A User’s guide to WINSTEPS® Rasch-model computer program. Chicago, IL: Author. Downloaded 10 October 2007 from http://www.winsteps.com/winsteps.htm.

  • Mislevy, R. J., & Huang, C.-W. (2007). Measurement models as narrative structures. In M. von Davier & C. H. Carstensen (Eds.), Multivariate and mixture distribution Rasch models: Extensions & applications (pp. 16–35). New York: Springer.

    Google Scholar 

  • Moseley, D., Baumfield, V., Elliott, J., Gregson, M., Higgins, S., Miller, J., et al. (2005). Frameworks for thinking. Cambridge, UK: Cambridge University Press.

    Book  Google Scholar 

  • Rupp, A. A., & Mislevy, R. J. (2007). Cognitive foundations of structured item response models. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment: Theories and applications (pp. 205–241). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  • Shelton, S. W. (1999). The effect of experience on the use of irrelevant evidence in auditor judgment. The Accounting Review, 74(2), 217–224.

    Article  Google Scholar 

  • Smith, R. M., Schumacker, R. E., & Bush, J. J. (1998). Using item mean squares to evaluate fit to the Rasch model. Journal of Outcome Measurement, 2, 66–78.

    Google Scholar 

  • Tardieua, H., Ehrlicha, M.-F., & Gyselincka, V. (1992). Levels of representation and domain-specific knowledge in comprehension of scientific texts. Language and Cognitive Processes, 7(3–4), 335–351. doi:10.1080/01690969208409390.

    Article  Google Scholar 

  • Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20(4), 345–354.

    Article  Google Scholar 

  • van de Watering, G., & van der Rijt, J. (2006). Teachers’ and students’ perceptions of assessments: A review and a study into the ability and accuracy of estimating the difficulty levels of assessment items. Educational Research Review, 1(2), 133–147.

    Article  Google Scholar 

  • van Hoeij, M. J. W., Haarhuis, J. C. M., Wierstra, R. F. A., & van Beukelen, P. (2004). Developing a classification tool based on Bloom’s Taxonomy to assess the cognitive level of short essay questions. Journal of Veterinary Medical Education, 31(3), 261–267.

    Article  Google Scholar 

  • Williams, R. D., & Haladyna, T. M. (1982). Logical operations for generating intended questions (LOGIQ): A typology for higher level test items. In G. H. Roid & T. M. Haladyna (Eds.), A technology for test-item writing (pp. 161–186). New York: Academic Press.

    Google Scholar 

  • Zheng, A. Y., Lawhorn, J. K., Lumley, T., & Freeman, S. (2008). Application of Bloom’s taxonomy debunks the “MCAT Myth”. Science, 319, 414–455. doi:10.1126/science.1147852.

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by a Curricular Innovation, Research, and Creativity in Learning Environment (CIRCLE) Grant (intramural (GUMC)) to RET.

Conflict of interest

No declarations of interest to report for any co-author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rochelle E. Tractenberg.

Appendix

Appendix

The Least Squares Distance Model (LSDM; Dimitrov 2007) uses existing IRT item parameter estimates obtained from a separate procedure or program and an appropriate Q-matrix to model attribute probabilities for fixed levels of ability. These probability estimates are calculated as intact units for each fixed level of theta according to the following equations:

$$ P_{ij} = \prod\limits_{k = 1}^{K} {\left[ {P\left( {A_{k} = 1|\theta_{i} } \right)} \right]^{{q_{jk} }} } ,\,{\text{then}} $$
$$ \ln P_{ij} = \sum\limits_{k = 1}^{K} {q_{jk} \ln P\left( {A_{k} = 1|\theta_{i} } \right)} , $$

similar to the Rasch model, where P ij is the probability of a correct response on item j by person i given ability θ i ; \( P\left( {A_{k} = 1|\theta_{i} } \right) \) is the probability of correct performance on attribute A k for the person with ability level θ i ; and q jk is the Q-matrix element (0, 1) associated with item j and attribute A k .

With n binary items, this generates a system of n linear equations with K unknowns, \( \ln P\left( {A_{k} = 1|\theta_{i} } \right) \), for each fixed level of ability. This system of equations is represented in matrix algebra form as L = QX where L is the vector of elements lnP ij (known); Q is the Q-matrix (known); and X is an unknown vector of elements X k  = \( \ln P\left( {A_{k} = 1|\theta_{i} } \right) \).

By minimizing the Euclidean norm of the vector ||QX  L||, the results of the unknown vector X and the least squares distance (LSD) are generated and the probability of a correct response for a student with ability θ i on a item associated with attribute A k is \( P\left( {A_{k} = 1|\theta_{i} } \right) \) = exp(Xk). The LSDM-calculated item probabilities are calculated as the product of the attribute probabilities across ability levels; these probabilities approximate the probabilities calculated under the Rasch model.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tractenberg, R.E., Gushta, M.M., Mulroney, S.E. et al. Multiple choice questions can be designed or revised to challenge learners’ critical thinking. Adv in Health Sci Educ 18, 945–961 (2013). https://doi.org/10.1007/s10459-012-9434-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-012-9434-4

Keywords

Navigation