Detecting Score Drift in a High-Stakes Performance-Based Assessment

McKinley, Danette W.; Boulet, John R.

doi:10.1023/B:AHSE.0000012214.40340.03

Detecting Score Drift in a High-Stakes Performance-Based Assessment

Published: March 2004

Volume 9, pages 29–38, (2004)
Cite this article

Advances in Health Sciences Education Aims and scope Submit manuscript

Danette W. McKinley¹ &
John R. Boulet¹

184 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

Although studies have been conducted to examine the effects of a variety of factors on the comparability of scores obtained from standardized patient examinations (SPE), little research has been conducted to specifically investigate the challenge of detecting drift in case difficulty estimates over time, particularly for large-scale, performance-based, assessments. The purpose of the current study was to investigate the use of a procedure to detect drift in the difficulty estimates for a large-scale, high stakes SPE. The results of this investigation suggest that, for particular performance tasks, there was some variation in mean scores over time. These findings indicate that, although it is feasible to create a bank of case-SP means and link scores back to these fixed estimates, special attention must be paid to the standardization of exam materials over time. This is essential to ensure comparability of scores and pass-fail decisions for candidates who are assessed on multiple test forms throughout the year.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using video-based examiner score comparison and adjustment (VESCA) to compare the influence of examiners at different sites in a distributed objective structured clinical exam (OSCE)

Article Open access 26 October 2023

The don’t know option in progress testing

Article Open access 26 April 2015

A pilot study of marking accuracy and mental workload as measures of OSCE examiner performance

Article Open access 25 July 2016

References

Battles, J.B., Carpenter, J.L., McIntire, D.D. & Wagner, J.M. (1994). Analyzing and adjusting for variables in a large-scale standardized-patient examination. Academic Medicine 69: 370-376.
Article Google Scholar
Boulet, J.R., Ben-David, M.F., Ziv, A., Burdick, W.P., Curtis, M., Peitzman, S.J. & Gary, N.E. (1998a). Using standardized patients to assess the interpersonal skills of physicians. Academic Medicine 73(10 suppl.): S94-S96.
Google Scholar
Boulet, J.R., Ben-David, M.F., Hambleton, R.K., Burdick, W.P., Ziv, A. & Gary, N.E. (1998b). An investigation of the sources of measurement error in the post-encounter written scores from standardized patient examinations. Advances in Health Sciences Education 3: 89-100.
Article Google Scholar
Boulet, J., Friedman Ben-David, M., Ziv, A., Burdick, W.P. & Gary, N.E. (2000). The use of holistic scoring for post-encounter written exercises. In D. Melnick (ed.), Proceedings from the Eighth Ottawa Conference on Medical Education and Assessment, Philadelphia, USA. National Board of Medical Examiners.
Colliver, J.A., Vu, N.V., Verhulst, S.J. & Barrows, H.S. (1991). Effect of position-within-sequence on case performance in a multiple-station examination using standardized patient cases. Evaluation and the Health Professions 14: 343-355.
Google Scholar
De Champlain, A.F., Macmillan, M.K., Margolis, M.J., Klass, D.J., Nungester, R.J., Schimpfauser, F. & Zinnerstrom, K. (1999). Modeling the effects of security breaches on students' performance on a large-scale standardized patient examination. Academic Medicine 74(suppl.): S49-S51.
Google Scholar
Gispert, R., Rue, M., Roma, J. & Martinez-Carretero, J.M. (1999). Gender, sequence of cases, and day effects on clinical skills assessment with standardized patients. Medical Education 33: 499-503.
Article Google Scholar
Gordon, B., Englehard, Jr., G., Gabrielson, S. & Bernknopf, B. (1996). Conceptual issues in equating performance assessments: Lessons from writing assessment. Journal of Research and Development in Education 29: 81-88.
Google Scholar
Green, B.F. (1995). Comparability of scores from performance assessments. Educational Measurement: Issues and Practice 14: 13-15, 24.
Article Google Scholar
Harris, D.J. & Welch, C.J. (1995, April). Scaling and Equating in High Stakes Writing Assessment. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Francisco.
Lloyd, J.S., Williams, R.G., Simonton, D.K. & Sherman, D. (1990). Order effects in standardized patient examinations. Academic Medicine 65(suppl.): S51-S52.
Article Google Scholar
Muraki, E., Hombo, C.M. & Lee, Y.W. (2000). Equating and linking of performance assessments. Applied Psychological Measurement 24: 325-337.
Article Google Scholar
Newble, D.I. & Swanson, D.B. (1988). Psychometric characteristics of the objective structured clinical examination. Medical Education 22: 325-334.
Article Google Scholar
Petersen, N.S., Kolen, M.J. & Hoover, H.D. (1989). Scaling, norming and equating. In R.L. Linn (ed.), Educational Measurement 3rd edition (pp. 221-262).
Resnick, R.K., Blackmore, D., Dauphinee, W.D., Rothman, A.I. & Smee, S. (1996). Large-scale high-stakes testing with an OSCE: Report from the Medical Council of Canada. Academic Medicine 71: S19-S21.
Article Google Scholar
Swanson, D.B., Clauser, B.E. & Case, S.M. (1999). Clinical skills assessment with standardized patients in high-stakes tests: A framework for thinking about score precision, equating, and security. Advances in Health Sciences Education 4: 67-106.
Article Google Scholar
Swanson, D.B. & Norcini, J.J. (1989). Factors influencing reproducibility of tests using standardized patients. Teaching and Learning in Medicine 1: 158-166.
Article Google Scholar
Vu, N.V. & Barrows, H.S. (1994). Use of standardized patients in clinical assessments: Recent developments and measurement findings. Educational Researcher 23: 25-30.
Article Google Scholar
Whelan, G.P. (1999). Educational Commission for Foreign Medical Graduates: Clinical Skills Assessment prototype. Medical Teacher 21: 156-160.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Research and Evaluation, Educational Commission for Foreign Medical Graduates, 3624 Market Street, 4th Floor, Philadelphia, PA, 19104, USA
Danette W. McKinley & John R. Boulet

Authors

Danette W. McKinley
View author publications
You can also search for this author in PubMed Google Scholar
John R. Boulet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Danette W. McKinley.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McKinley, D.W., Boulet, J.R. Detecting Score Drift in a High-Stakes Performance-Based Assessment. Adv Health Sci Educ Theory Pract 9, 29–38 (2004). https://doi.org/10.1023/B:AHSE.0000012214.40340.03

Download citation

Issue Date: March 2004
DOI: https://doi.org/10.1023/B:AHSE.0000012214.40340.03

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting Score Drift in a High-Stakes Performance-Based Assessment

Abstract

Access this article

Similar content being viewed by others

Using video-based examiner score comparison and adjustment (VESCA) to compare the influence of examiners at different sites in a distributed objective structured clinical exam (OSCE)

The don’t know option in progress testing

A pilot study of marking accuracy and mental workload as measures of OSCE examiner performance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

Detecting Score Drift in a High-Stakes Performance-Based Assessment

Abstract

Access this article

Similar content being viewed by others

Using video-based examiner score comparison and adjustment (VESCA) to compare the influence of examiners at different sites in a distributed objective structured clinical exam (OSCE)

The don’t know option in progress testing

A pilot study of marking accuracy and mental workload as measures of OSCE examiner performance

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation