Skip to main content

Advertisement

Log in

Pick-N multiple choice-exams: a comparison of scoring algorithms

  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

To compare different scoring algorithms for Pick-N multiple correct answer multiple-choice (MC) exams regarding test reliability, student performance, total item discrimination and item difficulty. Data from six 3rd year medical students’ end of term exams in internal medicine from 2005 to 2008 at Munich University were analysed (1,255 students, 180 Pick-N items in total). Scoring Algorithms: Each question scored a maximum of one point. We compared: (a) Dichotomous scoring (DS): One point if all true and no wrong answers were chosen. (b) Partial credit algorithm 1 (PS50): One point for 100% true answers; 0.5 points for 50% or more true answers; zero points for less than 50% true answers. No point deduction for wrong choices. (c) Partial credit algorithm 2 (PS1/m): A fraction of one point depending on the total number of true answers was given for each correct answer identified. No point deduction for wrong choices. Application of partial crediting resulted in psychometric results superior to dichotomous scoring (DS). Algorithms examined resulted in similar psychometric data with PS50 only slightly exceeding PS1/m in higher coefficients of reliability. The Pick-N MC format and its scoring using the PS50 and PS1/m algorithms are suited for undergraduate medical examinations. Partial knowledge should be awarded in Pick-N MC exams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. See “Appendix” for example.

  2. i.e., containing the autocorrelation.

References

  • Albanese, M. A. (1993). Type K and other complex multiple-choice items: An analysis of research and item properties. Educational Measurement: Issues and Practice, 12(1), 28–33.

    Article  Google Scholar 

  • Albanese, M. A., & Sabers, D. L. (1988). Multiple true-false items: A study of interitem correlations, scoring alternatives, and reliability estimation. Journal of Educational Measurement, 25(2), 111–123.

    Article  Google Scholar 

  • Ben-Simon, A., Budescu, D. V., & Nevo, B. (1997). A comparative study of measures of partial knowledge in multiple-choice tests. Applied Psychological Measurement, 21(1), 65–88.

    Article  Google Scholar 

  • Beullens, J., Struyf, E., & van Damme, B. (2005). Do extended matching multiple-choice questions measure clinical reasoning? Medical Education, 39(4), 410–417.

    Article  Google Scholar 

  • Beullens, J., Struyf, E., & van Damme, B. (2006). Diagnostic ability in relation to clinical seminars and extended-matching questions examinations. Medical Education, 40(12), 1173–1179.

    Article  Google Scholar 

  • Bland, A. C., Kreiter, C. D., & Gordon, J. A. (2005). The psychometric properties of five scoring methods applied to the script concordance test. Academic Medicine, 80(4), 395–399.

    Article  Google Scholar 

  • Case, S. M., & Swanson, D. B. (1993). Extended-matching items: A practical alternative to free-response questions. Teaching and Learning in Medicine, 5(2), 107–115.

    Article  Google Scholar 

  • Case, S. M., & Swanson, D. B. (2001). Constructing written test questions for the basic and clinical sciences. Philadelphia: National Board of Medical Examiners.

    Google Scholar 

  • Coderre, SP., Harasym, P., Mandin, H., & Fick, G. (2004). The impact of two multiple-choice question formats on the problem-solving strategies used by novices and experts. BMC Medical Education. 4(23).

  • Downing, S. M. (2004). Reliability: On the reproducibility of assessment data. Medical Education, 38(9), 1006–1012.

    Article  Google Scholar 

  • Epstein, R. M. (2007). Assessment in medical education. New England Journal of Medicine, 356, 387–396.

    Article  Google Scholar 

  • Fournier, J. P., Demeester, A., & Charlin, B. (2008). Script concordance tests: Guidelines for construction. BMC Medical Informatics and Decision Making. 8(18).

  • Haladyna, T. M., & Downing, S. M. (1989a). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 37–50.

    Article  Google Scholar 

  • Haladyna, T. M., & Downing, S. M. (1989b). Validity of a taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 51–78.

    Article  Google Scholar 

  • Haladyna, T. M., & Downing, S. M. (1993). How many options is enough for a multiple-choice test item? Educational and Psychological Measurement, 53(4), 999–1010.

    Article  Google Scholar 

  • Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), 309–333.

    Article  Google Scholar 

  • Itten, S., & Krebs, R. (1997). Messqualität der verschiedenen MC-Itemtypen in den beiden Vorprüfungen des Medizinstudiums an der Universität Bern 1997/2 (Forschungsbericht Institut für Aus-, Weiter-und Fortbildung (IAWF) der medizinischen Fakultät der Universität Bern). Bern: IAWF.

    Google Scholar 

  • Kassirer, J. P., & Kopelman, R. I. (1991). Learning clinical reasoning. Baltimore: Williams & Wilkins.

    Google Scholar 

  • Krebs, R. (2004). Anleitung zur Herstellung von MC-Fragen und MC-Prüfungen für die ärztliche Ausbildung. Institut für Medizinische Lehre IML, Abteilung für Ausbildungs- und Examensforschung AAE, Bern.

  • Lord, F. M. (1963). Formula scoring and validity. Educational and Psychological Measurement, 23(4), 663–672.

    Article  Google Scholar 

  • Möltner, A., Schellberg, D., & Jünger, J. (2006). Grundlegende quantitative Analysen medizinischer Prüfungen. GMS Zeitschrift für Medizinische Ausbildung. 23(3).

  • Nendaz, M. R., & Tekian, A. (1999). Assessment in problem-based learning medical schools: A literature review. Teaching and Learning in Medicine, 11(4), 232–243.

    Article  Google Scholar 

  • Norcini, J. J., Swanson, D. B., Grosso, L. J., Shea, J. A., & Webster, G. D. (1984). A comparison of knowledge, synthesis, and clinical judgment: Multiple-choice questions in the assessment of physician competence. Evaluation & the Health Professions., 7(4), 485–499.

    Article  Google Scholar 

  • Ripkey, D. R., Case, S. M., & Swanson, D. B. (1996). A “new” item format for assessing aspects of clinical competence. Academic Medicine, 71(10), S34–S36.

    Article  Google Scholar 

  • Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educational Measurement: Issues and Practice, 24, 3–13.

    Article  Google Scholar 

  • Rotthoff, T., Baehring, T., Dicken, H. D., Fahron, U., Richter, B., Fischer, M., & Scherbaum, W. (2006). Comparison between long-menu and open-ended questions in computerized medical assessments. A randomized controlled trial. BMC Medical Education 6(50).

  • Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2003). ABC of learning and teaching in medicine: Written assessment. British Medical Journal, 326, 643–645.

    Article  Google Scholar 

  • Swanson, D. B., Holtzman, K. Z., & Allbee, K. (2008). Measurement characteristics of content-parallel single-best-answer and extended-matching questions in relation to number and source of options. Academic Medicine, 83(10), 21–24.

    Article  Google Scholar 

  • Wakeford, R. E., & Roberts, S. (1984). Short answer questions in an undergraduate qualifying examination: A study of examiner variability. Medical Education, 18(3), 168–173.

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to extend their gratitude towards René Krebs of Berne University, Switzerland and Andreas Möltner of University of Heidelberg, Germany for their helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Bauer.

Appendix

Appendix

Exemplary Pick-N Item, taken from the mid-term exam of 2007/2008:

Mrs W., a 39 year old patient who up until then had been in good health, presents herself as an in-patient complaining of a persistent cough. Three weeks ago she had suffered from a cold and initially, the cough was non-productive. Now, in addition to the cough that has been producing yellow sputum for 3 days, her temperature has risen to 39°C and she complains of stabbing pain on the left when inhaling.

Which of the following initial diagnostic steps do you take on the basis of a suspected case of ambulatory pneumonia in addition to the physical examination?

(Select four answers!)

  1. A.

    Blood gas analysis

  2. B.

    Bronchoscopy

  3. C.

    Thorax CT

  4. D.

    Thorax x-ray

  5. E.

    Standard lab

  6. F.

    Spiroergometry

  7. G.

    Sputum tests

  8. H.

    Thorax sonography

  9. I.

    Ventilation scintigraphy

Answers A, D, E and H had been identified as correct answers

This item would be scored as follows:

 

m

0

1

2

3

4

DS

0.00

0.00

0.00

0.00

1.00

P50

0.00

0.00

0.50

0.50

1.00

P1/m

0.00

0.25

0.50

0.75

1.00

  1. Points earned in Pick-N item with n = 4 correct answers. m denominates the number of correct answers chosen by candidate

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bauer, D., Holzer, M., Kopp, V. et al. Pick-N multiple choice-exams: a comparison of scoring algorithms. Adv in Health Sci Educ 16, 211–221 (2011). https://doi.org/10.1007/s10459-010-9256-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-010-9256-1

Keywords

Navigation