Clinical Skills Assessment with Standardized Patients in High-Stakes Tests: A Framework for Thinking about Score Precision, Equating, and Security

Swanson, David B.; Clauser, Brian E.; Case, Susan M.

doi:10.1023/A:1009862220473

Clinical Skills Assessment with Standardized Patients in High-Stakes Tests: A Framework for Thinking about Score Precision, Equating, and Security

Published: January 1999

Volume 4, pages 67–106, (1999)
Cite this article

Advances in Health Sciences Education Aims and scope Submit manuscript

David B. Swanson¹,
Brian E. Clauser¹ &
Susan M. Case¹

531 Accesses
42 Citations
1 Altmetric
Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Angoff, W. H. (1971). Scales, Norms, and Equivalent Scores. In: R. L. Thorndike (ed.) Educational Measurement (2nd edn.). Washington, DC: American Council on Education, 508–600.
Google Scholar
Association of American Medical Colleges (1998). Emerging Trends in the Use of Standardized Patients. Contemporary Issues in Medical Education 1(7), 1–2.
Google Scholar
Battles, J. B., Carpenter, J. L., McIntire, D. & Wagner, J. M. (1994). Analyzing and Adjusting for Variables in a Large-Scale Standardized-Patient Examination. Academic Medicine 69(5), 370–376.
Article Google Scholar
Brennan, R. L. (1992). Elements of Generalizability Theory (rev. ed.). Iowa City, IA: American College Testing Program.
Google Scholar
Brennan, R. L. (1995). Generalizability of Performance Assessments. Educational Measurement: Issues and Practice 14(4), 9–12, 27.
Article Google Scholar
Case, S. M., Templeton, B., Samph, T. & Best, A. M. III (1992). Comparison of Observation-Based and Chart-Based Scores Derived from Standardized Patient Encounters. In: R. Harden, I. Hart & H. Mulholland (eds.) Approaches to Assessment of Clinical Competence. Norwich, England: Page Brothers, 471–475.
Google Scholar
Clauser, B. (1998). Equating Performance Assessments with the Rasch Rating-Scale Model Using Internal and External Links. Paper Presentation, Annual Meeting of the American Educational Research Association.
Cohen, R. et al. (1993). Impact of Repeated Use of Objective Structured Clinical Examination Stations. Academic Medicine (October Supplement), S73-S75.
Colliver, J. A. et al. (1989). Reliability of Performance on Standardized-Patient Cases: A Comparison of Consistency Measures Based on Generalizability Therory. Teaching and Learning in Medicine 1(1), 31–37.
Google Scholar
Colliver, J. A. et al. (1990). Three Studies of the Effect of Multiple Standardized-Patients on Intercase Reliability of Five Standardized-Patient Examinations. Teaching and Learning in Medicine 2(4), 237–245.
Google Scholar
Colliver, J. A. et al. (1991a). Effects of Using Two or More Standardized-Patients to Simulate the Same Case Means and Case Failure Rates. Academic Medicine 66(10), 616–618.
Article Google Scholar
Colliver, J. A. et al. (1991b). Test Security in Examinations That Use Standardized-Patient Cases at One Medical School. Academic Medicine 66(5), 279–282.
Article Google Scholar
Colliver, J. A. et al. (1991c). Test Security in Examinations Using Standardized-Patient Cases for Five Classes of Senior Medical Students. Academic Medicine 66, 279–282.
Google Scholar
Colliver, J. A. et al. (1994). Effect of Using Multiple Standardized Patients to Rate Interpersonal and Communication Skills on Intercase Reliability. Teaching and Learning in Medicine 6(1), 45–48.
Google Scholar
Colliver, J. A. et al. (1998). The Effect of Using Multiple Standardized Patients on the Inter-Case Reliability of a Large-Scale Standardized-Patient Examination Administered over an Extended Testing Period. Academic Medicine 73(October Supplement), S81-S83.
Google Scholar
Crick, J. E. & Brennan, R. L. (1983). The Manual for Genova. Iowa City, Iowa: American College Testing Program.
Google Scholar
Cronbach, L. J., Gleser, G. C., Nanda, H. H. & Rajaratnam, N. (1972). Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. New York: John Wiley and Sons Inc.
Google Scholar
DeChamplain, A. F. et al. (1997). Standardized Patients' Accuracy in Recording Examinees' Behaviors Using Checklists. Academic Medicine 72(October Supplement), S85-S87.
Google Scholar
DeChamplain, A. F. et al. (in press). Do Standardized Patients' Recording Discrepancies Impact upon Case and Examination Mastery-Level Decisions? Academic Medicine.
DeChamplain, A. F. et al. (under editorial review). Modeling the Effects of a Security Breach and Test Preparation on a Large-Scale Performance-Based Assessment.
Fitzpatrick, R. & Morrison, E. J. (1971). Performance and Product Evaluation. In: R. L. Thorndike (ed.) Educational Measurement. Washington, DC: American Council on Education, 237–270.
Google Scholar
Furman, G. E. et al. (1997). The Effect of Formal Feedback Sessions on Test Security for a Clinical Practice Examination Using Standardized Patients. In: A. J. J. A. Sherpbier, C. P. M. Van der Vleuten, J. J. Rethans & A. F. W. Van der Steeg (eds.) Advances in Medical Education. Dordrecht, The Netherlands: Kluwer Academic Publishers, 433–436.
Google Scholar
Gessaroli, M. E., Swanson, D. B. & DeChamplain, A. F. (1998). Equating Performance Assessments Using Structural Equation Models. Paper Presentation, Annual Meeting of the American Educational Research Association.
Grand-Maision, P. et al. (1992). Large Scale Use of an Objective Structured Clinical Examination for Licensing Family Physicians. Can Med Assoc J 146(10), 1735–1740.
Google Scholar
Hambleton, R. K. & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston: Kluwer Academic Publishers.
Google Scholar
Highland, R. W. (1955). A Guide for Use in Performance Testing in Air Force Technical Schools. Armament Systems Personnel Research Laboratory. Colorado: Lowry Air Force Base.
Google Scholar
Jolly, B. (1993). Learning Effect of Reusing Stations in an Objective Structured Clinical Examination. Teaching and Learning in Medicine 6(2), 66–71.
Google Scholar
Klass, D. J. (1994). High-Stakes Testing of Medical Students Using Standardized Patients. Teaching and Learning in Medicine 6, 23–27.
Google Scholar
Klass, D. J. et al. (1994). Progress in Developing a Standardized Patient Test of Clinical Skills at The National Board of Medical Examiners: Prototype Two. Proceedings of The Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 324–326.
Google Scholar
Klass, D. J. et al. (in press). Development of a Performance-Based Test of Clinical Skills for the United States Medical Licensing Examination. Proceedings of the 8th Annual Ottawa Conference.
Kolen, M. J. & Brennan, R. L. (1995). Test Equating: Methods and Practices. New York: Springer.
Google Scholar
Linn, R. (1993). Linking Results of Distinct Assessments. Applied Measurement in Education 6(1), 83–102.
Article Google Scholar
Livingston, S. & Lewi, C. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement 32(2), 179–197.
Article Google Scholar
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Google Scholar
Luecht, R. M. & DeChamplain, A. F. (1998). Applications of Latent Class Analysis to Mastery Decisions Using Complex Performance Assessments. Paper Presentation, Annual Meeting of the American Educational Research Association.
Mislevy, R. (1992). Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. ERIC Document #ED353302.
Niehaus, A. H., DaRosa, D. A., Markwell, S. J. & Folse, R. (1996). Is Test Security a Concern when OSCE Stations Are Repeated across Clerkship Rotations. Academic Medicine 71(October Supplement), S287-S289.
Article Google Scholar
Norman, G. R., Van der Vleuten, C. P. M. & de Graaff, E. (1991). Pitfalls in the Pursuit of Objectivity: Issues of Validity, Efficiency and Acceptability. Medical Education 25, 119–126.
Google Scholar
Reznick, R. K., Smee, S. M., Rothman, A. I., Chalmers, A., Swanson, D. B. & Dufresne, L. et al. (1992). An Objective Structured Clinical Examination for the Licentiate: Report of the Pilot Project of the Medical Council of Canada. Academic Medicine 67, 487–494.
Article Google Scholar
Reznick, R. K., Blackmore, D. E., Cohen, R., Baumber, J., Rothman, A. I., Smee, S. M., Chalmers, A., Poldre, P., Birdwhistle, R., Walsh, P., Spady, D. & Berard, M. (1993). An Objective Structured Clinical Exam for the Licentiate of the Medical Council of Canada: From Research to Reality. Academic Medicine 68(Suppl.), S4-S6.
Google Scholar
Reznick, R. K., Blackmore, D. E., Dauphinee, W. D., Smee, S. M. & Rothman, A. I. (1997). An OSCE for Licensure: The Canadian Experience. In: A. J. J. A. Scherpbier et al. (eds.) Advances in Medical Education. Dordrecht: Kluwer Academic Publisher, 458–461.
Google Scholar
Ripkey, D. R., Case, S.M. & Swanson, D. B. (1997). Predicting Performances on the NBME Surgery Subject Test and USMLE Step 2: Effects of Surgery Clerkship Timing and Length. Academic Medicine 72(October Supplement), S31-S33.
Article Google Scholar
Rothman, A. I., Cohen, R., Dawson-Saunders, E., Poldre, P. P. & Ross, J. (1992). Testing the Equivalence of Multiple Station Tests of Clinical Competence. Academic Medicine 67(October Supplement), S40-S41.
Google Scholar
Rutala, R. J. (1991). Sharing of Information by Students in an OSCE. Archives of Internal Medicine 151, 541–544.
Article Google Scholar
Searle, S. R. (1971). Linear Models. New York: John Wiley and Sons.
Google Scholar
Shavelson, R., Webb, N. & Rowley, G. (1989). Generalizability Theory. American Psychologist 44(6), 922–932.
Article Google Scholar
Skakun, E. N., Cook, D. A. & Morrison, J. C. (1992). Test Security on Sequential OSCE and Multiple-Choice Examinations. In: I. R. Hart, R. M. Harden & J. Des Marchais (eds.) Current Developments in Assessing Clinical Competence. Montreal, Canada: Can-Heal Publications, 711–718.
Google Scholar
Stillman, P. L. et al. (1991). Is Test Security an Issue in a Multistation Clinical Assessment? — A Preliminary Study. Academic Medicine 66(October Supplement), S25-S27.
Google Scholar
Swanson, D. B. (1987). A Measurement Framework for Performance-Based Tests. In: I. Hart & R. Harden (eds.) Further Developments in Assessing Clinical Competence. Montreal: Can-Heal Publications, Inc, 13–42.
Google Scholar
Swanson, D. B. & Norcini, J. J. (1989). Factors Influencing the Reproducibility of Tests Using Standardized Patients. Teaching and Learning in Medicine 1, 158–166.
Google Scholar
Swanson, D. B., Norcini, J. J. & Grosso, L. J. (1987). Assessment of Clinical Competence: Written and Computer-Based Simulations. Assessment and Evaluation in Higher Education 12(3), 220–246.
Google Scholar
Swanson, D. B., Norman, G. R. & Linn, R. (1995). Performance-Based Assessment: Lessons from the Health Professions. Educational Researcher 24(5), 5–11, 35.
Article Google Scholar
Swartz, M. H. et al. (1995). The Effect of Deliberate, Excessive Violations of Test Security on a Standardized-Patient Examination: An Extended Analysis. In: Proceedings of The Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 280–284.
Google Scholar
Tamblyn, R. M. (1989). The Use of Standardized Patients in the Evaluation of Clinical Competence: The Evaluation of Selected Measurement Properties. Doctoral Thesis, McGill University, Department of Epidemiology, Montreal.
Google Scholar
Tamblyn, R. M. et al. (1991a). The Accuracy of Standardized Patient Presentation. Medical Education 25, 100–109.
Article Google Scholar
Tamblyn, R. M. et al. (1991b). Sources of Unreliability and Bias in Standardized-Patient Rating. Teaching and Learning in Medicine 3, 74–85.
Article Google Scholar
Van der Linden, W. J. & Hambleton, R. K. (1997). Handbook of Modern Item Response Theory. New York: Springer.
Google Scholar
Van der Vleuten, C. P. M. (1996). The Assessment of Professional Competence: Developments, Research, and Practical Implications. Advances in Health Sciences Education 1, 41–67.
Article Google Scholar
Van der Vleuten, C. P. M. & Swanson, D. B. (1990). Assessment of Clinical Skills with Standardized Patients: State of the Art. Teaching and Learning in Medicine 2, 58–76.
Google Scholar
Whelan, G. P. et al. (in press). Educational Commission for Foreign Medical Graduates Clinical Skills Assessment. Proceedings of the 8th Annual Ottawa Conference.
Whelan, G. P. & Moses, V. K. (1990). The Effect on Grades of the Timing and Site of Third-year Internal Medicine Clerkships. Academic Medicine 65(11), 708–709.
Article Google Scholar
Williams, R. G. et al. (1987). Direct Standardized Assessment of Clinical Competence. Medical Education 21, 482–489
Google Scholar
Williams, R. G., Lloyd, J. S. & Simonton, D. K. (1992). Sources of OSCE Examination Information and Perceived Helpfulness: A Study of the Grapevine. In: I. R. Hart, R. M. Harden & J. Des Marchais (eds.) Current Developments in Assessing Clinical Competence. Montreal, Canada: Can-Heal Publications, 363–370.
Google Scholar
Woolliscroft, J. O., Swanson, D. B., Case, S. M. & Ripkey, D. R. (1995). Monitoring the Effectiveness of the Clinical Curriculum: Use of a Cross-Clerkship Exam to Assess Development of Diagnostic Skills. In: Rothman AI, Cohen R, eds. Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 476–478.
Google Scholar
Wright, B. D. & Masters, G. N. (1982). Rating Scale Analysis. Chicago: MESA Press.
Google Scholar
Wright, B. D. & Stone, M. H. (1979). Best Test Design. Chicago: MESA Press.
Google Scholar

Download references

Author information

Authors and Affiliations

National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA, 19104, USA
David B. Swanson, Brian E. Clauser & Susan M. Case

Authors

David B. Swanson
View author publications
You can also search for this author in PubMed Google Scholar
Brian E. Clauser
View author publications
You can also search for this author in PubMed Google Scholar
Susan M. Case
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David B. Swanson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Swanson, D.B., Clauser, B.E. & Case, S.M. Clinical Skills Assessment with Standardized Patients in High-Stakes Tests: A Framework for Thinking about Score Precision, Equating, and Security. Adv Health Sci Educ Theory Pract 4, 67–106 (1999). https://doi.org/10.1023/A:1009862220473

Download citation

Issue Date: January 1999
DOI: https://doi.org/10.1023/A:1009862220473

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Clinical Skills Assessment with Standardized Patients in High-Stakes Tests: A Framework for Thinking about Score Precision, Equating, and Security

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation