Abstract
Although studies have been conducted to examine the effects of a variety of factors on the comparability of scores obtained from standardized patient examinations (SPE), little research has been conducted to specifically investigate the challenge of detecting drift in case difficulty estimates over time, particularly for large-scale, performance-based, assessments. The purpose of the current study was to investigate the use of a procedure to detect drift in the difficulty estimates for a large-scale, high stakes SPE. The results of this investigation suggest that, for particular performance tasks, there was some variation in mean scores over time. These findings indicate that, although it is feasible to create a bank of case-SP means and link scores back to these fixed estimates, special attention must be paid to the standardization of exam materials over time. This is essential to ensure comparability of scores and pass-fail decisions for candidates who are assessed on multiple test forms throughout the year.
Similar content being viewed by others
References
Battles, J.B., Carpenter, J.L., McIntire, D.D. & Wagner, J.M. (1994). Analyzing and adjusting for variables in a large-scale standardized-patient examination. Academic Medicine 69: 370-376.
Boulet, J.R., Ben-David, M.F., Ziv, A., Burdick, W.P., Curtis, M., Peitzman, S.J. & Gary, N.E. (1998a). Using standardized patients to assess the interpersonal skills of physicians. Academic Medicine 73(10 suppl.): S94-S96.
Boulet, J.R., Ben-David, M.F., Hambleton, R.K., Burdick, W.P., Ziv, A. & Gary, N.E. (1998b). An investigation of the sources of measurement error in the post-encounter written scores from standardized patient examinations. Advances in Health Sciences Education 3: 89-100.
Boulet, J., Friedman Ben-David, M., Ziv, A., Burdick, W.P. & Gary, N.E. (2000). The use of holistic scoring for post-encounter written exercises. In D. Melnick (ed.), Proceedings from the Eighth Ottawa Conference on Medical Education and Assessment, Philadelphia, USA. National Board of Medical Examiners.
Colliver, J.A., Vu, N.V., Verhulst, S.J. & Barrows, H.S. (1991). Effect of position-within-sequence on case performance in a multiple-station examination using standardized patient cases. Evaluation and the Health Professions 14: 343-355.
De Champlain, A.F., Macmillan, M.K., Margolis, M.J., Klass, D.J., Nungester, R.J., Schimpfauser, F. & Zinnerstrom, K. (1999). Modeling the effects of security breaches on students' performance on a large-scale standardized patient examination. Academic Medicine 74(suppl.): S49-S51.
Gispert, R., Rue, M., Roma, J. & Martinez-Carretero, J.M. (1999). Gender, sequence of cases, and day effects on clinical skills assessment with standardized patients. Medical Education 33: 499-503.
Gordon, B., Englehard, Jr., G., Gabrielson, S. & Bernknopf, B. (1996). Conceptual issues in equating performance assessments: Lessons from writing assessment. Journal of Research and Development in Education 29: 81-88.
Green, B.F. (1995). Comparability of scores from performance assessments. Educational Measurement: Issues and Practice 14: 13-15, 24.
Harris, D.J. & Welch, C.J. (1995, April). Scaling and Equating in High Stakes Writing Assessment. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco.
Lloyd, J.S., Williams, R.G., Simonton, D.K. & Sherman, D. (1990). Order effects in standardized patient examinations. Academic Medicine 65(suppl.): S51-S52.
Muraki, E., Hombo, C.M. & Lee, Y.W. (2000). Equating and linking of performance assessments. Applied Psychological Measurement 24: 325-337.
Newble, D.I. & Swanson, D.B. (1988). Psychometric characteristics of the objective structured clinical examination. Medical Education 22: 325-334.
Petersen, N.S., Kolen, M.J. & Hoover, H.D. (1989). Scaling, norming and equating. In R.L. Linn (ed.), Educational Measurement 3rd edition (pp. 221-262).
Resnick, R.K., Blackmore, D., Dauphinee, W.D., Rothman, A.I. & Smee, S. (1996). Large-scale high-stakes testing with an OSCE: Report from the Medical Council of Canada. Academic Medicine 71: S19-S21.
Swanson, D.B., Clauser, B.E. & Case, S.M. (1999). Clinical skills assessment with standardized patients in high-stakes tests: A framework for thinking about score precision, equating, and security. Advances in Health Sciences Education 4: 67-106.
Swanson, D.B. & Norcini, J.J. (1989). Factors influencing reproducibility of tests using standardized patients. Teaching and Learning in Medicine 1: 158-166.
Vu, N.V. & Barrows, H.S. (1994). Use of standardized patients in clinical assessments: Recent developments and measurement findings. Educational Researcher 23: 25-30.
Whelan, G.P. (1999). Educational Commission for Foreign Medical Graduates: Clinical Skills Assessment prototype. Medical Teacher 21: 156-160.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
McKinley, D.W., Boulet, J.R. Detecting Score Drift in a High-Stakes Performance-Based Assessment. Adv Health Sci Educ Theory Pract 9, 29–38 (2004). https://doi.org/10.1023/B:AHSE.0000012214.40340.03
Issue Date:
DOI: https://doi.org/10.1023/B:AHSE.0000012214.40340.03