Skip to main content
Log in

Clinical Skills Assessment with Standardized Patients in High-Stakes Tests: A Framework for Thinking about Score Precision, Equating, and Security

  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

  • Angoff, W. H. (1971). Scales, Norms, and Equivalent Scores. In: R. L. Thorndike (ed.) Educational Measurement (2nd edn.). Washington, DC: American Council on Education, 508–600.

    Google Scholar 

  • Association of American Medical Colleges (1998). Emerging Trends in the Use of Standardized Patients. Contemporary Issues in Medical Education 1(7), 1–2.

    Google Scholar 

  • Battles, J. B., Carpenter, J. L., McIntire, D. & Wagner, J. M. (1994). Analyzing and Adjusting for Variables in a Large-Scale Standardized-Patient Examination. Academic Medicine 69(5), 370–376.

    Article  Google Scholar 

  • Brennan, R. L. (1992). Elements of Generalizability Theory (rev. ed.). Iowa City, IA: American College Testing Program.

    Google Scholar 

  • Brennan, R. L. (1995). Generalizability of Performance Assessments. Educational Measurement: Issues and Practice 14(4), 9–12, 27.

    Article  Google Scholar 

  • Case, S. M., Templeton, B., Samph, T. & Best, A. M. III (1992). Comparison of Observation-Based and Chart-Based Scores Derived from Standardized Patient Encounters. In: R. Harden, I. Hart & H. Mulholland (eds.) Approaches to Assessment of Clinical Competence. Norwich, England: Page Brothers, 471–475.

    Google Scholar 

  • Clauser, B. (1998). Equating Performance Assessments with the Rasch Rating-Scale Model Using Internal and External Links. Paper Presentation, Annual Meeting of the American Educational Research Association.

  • Cohen, R. et al. (1993). Impact of Repeated Use of Objective Structured Clinical Examination Stations. Academic Medicine (October Supplement), S73-S75.

  • Colliver, J. A. et al. (1989). Reliability of Performance on Standardized-Patient Cases: A Comparison of Consistency Measures Based on Generalizability Therory. Teaching and Learning in Medicine 1(1), 31–37.

    Google Scholar 

  • Colliver, J. A. et al. (1990). Three Studies of the Effect of Multiple Standardized-Patients on Intercase Reliability of Five Standardized-Patient Examinations. Teaching and Learning in Medicine 2(4), 237–245.

    Google Scholar 

  • Colliver, J. A. et al. (1991a). Effects of Using Two or More Standardized-Patients to Simulate the Same Case Means and Case Failure Rates. Academic Medicine 66(10), 616–618.

    Article  Google Scholar 

  • Colliver, J. A. et al. (1991b). Test Security in Examinations That Use Standardized-Patient Cases at One Medical School. Academic Medicine 66(5), 279–282.

    Article  Google Scholar 

  • Colliver, J. A. et al. (1991c). Test Security in Examinations Using Standardized-Patient Cases for Five Classes of Senior Medical Students. Academic Medicine 66, 279–282.

    Google Scholar 

  • Colliver, J. A. et al. (1994). Effect of Using Multiple Standardized Patients to Rate Interpersonal and Communication Skills on Intercase Reliability. Teaching and Learning in Medicine 6(1), 45–48.

    Google Scholar 

  • Colliver, J. A. et al. (1998). The Effect of Using Multiple Standardized Patients on the Inter-Case Reliability of a Large-Scale Standardized-Patient Examination Administered over an Extended Testing Period. Academic Medicine 73(October Supplement), S81-S83.

    Google Scholar 

  • Crick, J. E. & Brennan, R. L. (1983). The Manual for Genova. Iowa City, Iowa: American College Testing Program.

    Google Scholar 

  • Cronbach, L. J., Gleser, G. C., Nanda, H. H. & Rajaratnam, N. (1972). Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. New York: John Wiley and Sons Inc.

    Google Scholar 

  • DeChamplain, A. F. et al. (1997). Standardized Patients' Accuracy in Recording Examinees' Behaviors Using Checklists. Academic Medicine 72(October Supplement), S85-S87.

    Google Scholar 

  • DeChamplain, A. F. et al. (in press). Do Standardized Patients' Recording Discrepancies Impact upon Case and Examination Mastery-Level Decisions? Academic Medicine.

  • DeChamplain, A. F. et al. (under editorial review). Modeling the Effects of a Security Breach and Test Preparation on a Large-Scale Performance-Based Assessment.

  • Fitzpatrick, R. & Morrison, E. J. (1971). Performance and Product Evaluation. In: R. L. Thorndike (ed.) Educational Measurement. Washington, DC: American Council on Education, 237–270.

    Google Scholar 

  • Furman, G. E. et al. (1997). The Effect of Formal Feedback Sessions on Test Security for a Clinical Practice Examination Using Standardized Patients. In: A. J. J. A. Sherpbier, C. P. M. Van der Vleuten, J. J. Rethans & A. F. W. Van der Steeg (eds.) Advances in Medical Education. Dordrecht, The Netherlands: Kluwer Academic Publishers, 433–436.

    Google Scholar 

  • Gessaroli, M. E., Swanson, D. B. & DeChamplain, A. F. (1998). Equating Performance Assessments Using Structural Equation Models. Paper Presentation, Annual Meeting of the American Educational Research Association.

  • Grand-Maision, P. et al. (1992). Large Scale Use of an Objective Structured Clinical Examination for Licensing Family Physicians. Can Med Assoc J 146(10), 1735–1740.

    Google Scholar 

  • Hambleton, R. K. & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston: Kluwer Academic Publishers.

    Google Scholar 

  • Highland, R. W. (1955). A Guide for Use in Performance Testing in Air Force Technical Schools. Armament Systems Personnel Research Laboratory. Colorado: Lowry Air Force Base.

    Google Scholar 

  • Jolly, B. (1993). Learning Effect of Reusing Stations in an Objective Structured Clinical Examination. Teaching and Learning in Medicine 6(2), 66–71.

    Google Scholar 

  • Klass, D. J. (1994). High-Stakes Testing of Medical Students Using Standardized Patients. Teaching and Learning in Medicine 6, 23–27.

    Google Scholar 

  • Klass, D. J. et al. (1994). Progress in Developing a Standardized Patient Test of Clinical Skills at The National Board of Medical Examiners: Prototype Two. Proceedings of The Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 324–326.

    Google Scholar 

  • Klass, D. J. et al. (in press). Development of a Performance-Based Test of Clinical Skills for the United States Medical Licensing Examination. Proceedings of the 8th Annual Ottawa Conference.

  • Kolen, M. J. & Brennan, R. L. (1995). Test Equating: Methods and Practices. New York: Springer.

    Google Scholar 

  • Linn, R. (1993). Linking Results of Distinct Assessments. Applied Measurement in Education 6(1), 83–102.

    Article  Google Scholar 

  • Livingston, S. & Lewi, C. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement 32(2), 179–197.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates.

    Google Scholar 

  • Luecht, R. M. & DeChamplain, A. F. (1998). Applications of Latent Class Analysis to Mastery Decisions Using Complex Performance Assessments. Paper Presentation, Annual Meeting of the American Educational Research Association.

  • Mislevy, R. (1992). Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. ERIC Document #ED353302.

  • Niehaus, A. H., DaRosa, D. A., Markwell, S. J. & Folse, R. (1996). Is Test Security a Concern when OSCE Stations Are Repeated across Clerkship Rotations. Academic Medicine 71(October Supplement), S287-S289.

    Article  Google Scholar 

  • Norman, G. R., Van der Vleuten, C. P. M. & de Graaff, E. (1991). Pitfalls in the Pursuit of Objectivity: Issues of Validity, Efficiency and Acceptability. Medical Education 25, 119–126.

    Google Scholar 

  • Reznick, R. K., Smee, S. M., Rothman, A. I., Chalmers, A., Swanson, D. B. & Dufresne, L. et al. (1992). An Objective Structured Clinical Examination for the Licentiate: Report of the Pilot Project of the Medical Council of Canada. Academic Medicine 67, 487–494.

    Article  Google Scholar 

  • Reznick, R. K., Blackmore, D. E., Cohen, R., Baumber, J., Rothman, A. I., Smee, S. M., Chalmers, A., Poldre, P., Birdwhistle, R., Walsh, P., Spady, D. & Berard, M. (1993). An Objective Structured Clinical Exam for the Licentiate of the Medical Council of Canada: From Research to Reality. Academic Medicine 68(Suppl.), S4-S6.

    Google Scholar 

  • Reznick, R. K., Blackmore, D. E., Dauphinee, W. D., Smee, S. M. & Rothman, A. I. (1997). An OSCE for Licensure: The Canadian Experience. In: A. J. J. A. Scherpbier et al. (eds.) Advances in Medical Education. Dordrecht: Kluwer Academic Publisher, 458–461.

    Google Scholar 

  • Ripkey, D. R., Case, S.M. & Swanson, D. B. (1997). Predicting Performances on the NBME Surgery Subject Test and USMLE Step 2: Effects of Surgery Clerkship Timing and Length. Academic Medicine 72(October Supplement), S31-S33.

    Article  Google Scholar 

  • Rothman, A. I., Cohen, R., Dawson-Saunders, E., Poldre, P. P. & Ross, J. (1992). Testing the Equivalence of Multiple Station Tests of Clinical Competence. Academic Medicine 67(October Supplement), S40-S41.

    Google Scholar 

  • Rutala, R. J. (1991). Sharing of Information by Students in an OSCE. Archives of Internal Medicine 151, 541–544.

    Article  Google Scholar 

  • Searle, S. R. (1971). Linear Models. New York: John Wiley and Sons.

    Google Scholar 

  • Shavelson, R., Webb, N. & Rowley, G. (1989). Generalizability Theory. American Psychologist 44(6), 922–932.

    Article  Google Scholar 

  • Skakun, E. N., Cook, D. A. & Morrison, J. C. (1992). Test Security on Sequential OSCE and Multiple-Choice Examinations. In: I. R. Hart, R. M. Harden & J. Des Marchais (eds.) Current Developments in Assessing Clinical Competence. Montreal, Canada: Can-Heal Publications, 711–718.

    Google Scholar 

  • Stillman, P. L. et al. (1991). Is Test Security an Issue in a Multistation Clinical Assessment? — A Preliminary Study. Academic Medicine 66(October Supplement), S25-S27.

    Google Scholar 

  • Swanson, D. B. (1987). A Measurement Framework for Performance-Based Tests. In: I. Hart & R. Harden (eds.) Further Developments in Assessing Clinical Competence. Montreal: Can-Heal Publications, Inc, 13–42.

    Google Scholar 

  • Swanson, D. B. & Norcini, J. J. (1989). Factors Influencing the Reproducibility of Tests Using Standardized Patients. Teaching and Learning in Medicine 1, 158–166.

    Google Scholar 

  • Swanson, D. B., Norcini, J. J. & Grosso, L. J. (1987). Assessment of Clinical Competence: Written and Computer-Based Simulations. Assessment and Evaluation in Higher Education 12(3), 220–246.

    Google Scholar 

  • Swanson, D. B., Norman, G. R. & Linn, R. (1995). Performance-Based Assessment: Lessons from the Health Professions. Educational Researcher 24(5), 5–11, 35.

    Article  Google Scholar 

  • Swartz, M. H. et al. (1995). The Effect of Deliberate, Excessive Violations of Test Security on a Standardized-Patient Examination: An Extended Analysis. In: Proceedings of The Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 280–284.

    Google Scholar 

  • Tamblyn, R. M. (1989). The Use of Standardized Patients in the Evaluation of Clinical Competence: The Evaluation of Selected Measurement Properties. Doctoral Thesis, McGill University, Department of Epidemiology, Montreal.

    Google Scholar 

  • Tamblyn, R. M. et al. (1991a). The Accuracy of Standardized Patient Presentation. Medical Education 25, 100–109.

    Article  Google Scholar 

  • Tamblyn, R. M. et al. (1991b). Sources of Unreliability and Bias in Standardized-Patient Rating. Teaching and Learning in Medicine 3, 74–85.

    Article  Google Scholar 

  • Van der Linden, W. J. & Hambleton, R. K. (1997). Handbook of Modern Item Response Theory. New York: Springer.

    Google Scholar 

  • Van der Vleuten, C. P. M. (1996). The Assessment of Professional Competence: Developments, Research, and Practical Implications. Advances in Health Sciences Education 1, 41–67.

    Article  Google Scholar 

  • Van der Vleuten, C. P. M. & Swanson, D. B. (1990). Assessment of Clinical Skills with Standardized Patients: State of the Art. Teaching and Learning in Medicine 2, 58–76.

    Google Scholar 

  • Whelan, G. P. et al. (in press). Educational Commission for Foreign Medical Graduates Clinical Skills Assessment. Proceedings of the 8th Annual Ottawa Conference.

  • Whelan, G. P. & Moses, V. K. (1990). The Effect on Grades of the Timing and Site of Third-year Internal Medicine Clerkships. Academic Medicine 65(11), 708–709.

    Article  Google Scholar 

  • Williams, R. G. et al. (1987). Direct Standardized Assessment of Clinical Competence. Medical Education 21, 482–489

    Google Scholar 

  • Williams, R. G., Lloyd, J. S. & Simonton, D. K. (1992). Sources of OSCE Examination Information and Perceived Helpfulness: A Study of the Grapevine. In: I. R. Hart, R. M. Harden & J. Des Marchais (eds.) Current Developments in Assessing Clinical Competence. Montreal, Canada: Can-Heal Publications, 363–370.

    Google Scholar 

  • Woolliscroft, J. O., Swanson, D. B., Case, S. M. & Ripkey, D. R. (1995). Monitoring the Effectiveness of the Clinical Curriculum: Use of a Cross-Clerkship Exam to Assess Development of Diagnostic Skills. In: Rothman AI, Cohen R, eds. Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 476–478.

    Google Scholar 

  • Wright, B. D. & Masters, G. N. (1982). Rating Scale Analysis. Chicago: MESA Press.

    Google Scholar 

  • Wright, B. D. & Stone, M. H. (1979). Best Test Design. Chicago: MESA Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David B. Swanson.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Swanson, D.B., Clauser, B.E. & Case, S.M. Clinical Skills Assessment with Standardized Patients in High-Stakes Tests: A Framework for Thinking about Score Precision, Equating, and Security. Adv Health Sci Educ Theory Pract 4, 67–106 (1999). https://doi.org/10.1023/A:1009862220473

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009862220473

Keywords

Navigation