Advances in Health Sciences Education

, Volume 4, Issue 1, pp 67–106 | Cite as

Clinical Skills Assessment with Standardized Patients in High-Stakes Tests: A Framework for Thinking about Score Precision, Equating, and Security

  • David B. Swanson
  • Brian E. Clauser
  • Susan M. Case

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Angoff, W. H. (1971). Scales, Norms, and Equivalent Scores. In: R. L. Thorndike (ed.) Educational Measurement (2nd edn.). Washington, DC: American Council on Education, 508–600.Google Scholar
  2. Association of American Medical Colleges (1998). Emerging Trends in the Use of Standardized Patients. Contemporary Issues in Medical Education 1(7), 1–2.Google Scholar
  3. Battles, J. B., Carpenter, J. L., McIntire, D. & Wagner, J. M. (1994). Analyzing and Adjusting for Variables in a Large-Scale Standardized-Patient Examination. Academic Medicine 69(5), 370–376.CrossRefGoogle Scholar
  4. Brennan, R. L. (1992). Elements of Generalizability Theory (rev. ed.). Iowa City, IA: American College Testing Program.Google Scholar
  5. Brennan, R. L. (1995). Generalizability of Performance Assessments. Educational Measurement: Issues and Practice 14(4), 9–12, 27.CrossRefGoogle Scholar
  6. Case, S. M., Templeton, B., Samph, T. & Best, A. M. III (1992). Comparison of Observation-Based and Chart-Based Scores Derived from Standardized Patient Encounters. In: R. Harden, I. Hart & H. Mulholland (eds.) Approaches to Assessment of Clinical Competence. Norwich, England: Page Brothers, 471–475.Google Scholar
  7. Clauser, B. (1998). Equating Performance Assessments with the Rasch Rating-Scale Model Using Internal and External Links. Paper Presentation, Annual Meeting of the American Educational Research Association.Google Scholar
  8. Cohen, R. et al. (1993). Impact of Repeated Use of Objective Structured Clinical Examination Stations. Academic Medicine (October Supplement), S73-S75.Google Scholar
  9. Colliver, J. A. et al. (1989). Reliability of Performance on Standardized-Patient Cases: A Comparison of Consistency Measures Based on Generalizability Therory. Teaching and Learning in Medicine 1(1), 31–37.Google Scholar
  10. Colliver, J. A. et al. (1990). Three Studies of the Effect of Multiple Standardized-Patients on Intercase Reliability of Five Standardized-Patient Examinations. Teaching and Learning in Medicine 2(4), 237–245.Google Scholar
  11. Colliver, J. A. et al. (1991a). Effects of Using Two or More Standardized-Patients to Simulate the Same Case Means and Case Failure Rates. Academic Medicine 66(10), 616–618.CrossRefGoogle Scholar
  12. Colliver, J. A. et al. (1991b). Test Security in Examinations That Use Standardized-Patient Cases at One Medical School. Academic Medicine 66(5), 279–282.CrossRefGoogle Scholar
  13. Colliver, J. A. et al. (1991c). Test Security in Examinations Using Standardized-Patient Cases for Five Classes of Senior Medical Students. Academic Medicine 66, 279–282.Google Scholar
  14. Colliver, J. A. et al. (1994). Effect of Using Multiple Standardized Patients to Rate Interpersonal and Communication Skills on Intercase Reliability. Teaching and Learning in Medicine 6(1), 45–48.Google Scholar
  15. Colliver, J. A. et al. (1998). The Effect of Using Multiple Standardized Patients on the Inter-Case Reliability of a Large-Scale Standardized-Patient Examination Administered over an Extended Testing Period. Academic Medicine 73(October Supplement), S81-S83.Google Scholar
  16. Crick, J. E. & Brennan, R. L. (1983). The Manual for Genova. Iowa City, Iowa: American College Testing Program.Google Scholar
  17. Cronbach, L. J., Gleser, G. C., Nanda, H. H. & Rajaratnam, N. (1972). Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. New York: John Wiley and Sons Inc.Google Scholar
  18. DeChamplain, A. F. et al. (1997). Standardized Patients' Accuracy in Recording Examinees' Behaviors Using Checklists. Academic Medicine 72(October Supplement), S85-S87.Google Scholar
  19. DeChamplain, A. F. et al. (in press). Do Standardized Patients' Recording Discrepancies Impact upon Case and Examination Mastery-Level Decisions? Academic Medicine.Google Scholar
  20. DeChamplain, A. F. et al. (under editorial review). Modeling the Effects of a Security Breach and Test Preparation on a Large-Scale Performance-Based Assessment.Google Scholar
  21. Fitzpatrick, R. & Morrison, E. J. (1971). Performance and Product Evaluation. In: R. L. Thorndike (ed.) Educational Measurement. Washington, DC: American Council on Education, 237–270.Google Scholar
  22. Furman, G. E. et al. (1997). The Effect of Formal Feedback Sessions on Test Security for a Clinical Practice Examination Using Standardized Patients. In: A. J. J. A. Sherpbier, C. P. M. Van der Vleuten, J. J. Rethans & A. F. W. Van der Steeg (eds.) Advances in Medical Education. Dordrecht, The Netherlands: Kluwer Academic Publishers, 433–436.Google Scholar
  23. Gessaroli, M. E., Swanson, D. B. & DeChamplain, A. F. (1998). Equating Performance Assessments Using Structural Equation Models. Paper Presentation, Annual Meeting of the American Educational Research Association.Google Scholar
  24. Grand-Maision, P. et al. (1992). Large Scale Use of an Objective Structured Clinical Examination for Licensing Family Physicians. Can Med Assoc J 146(10), 1735–1740.Google Scholar
  25. Hambleton, R. K. & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston: Kluwer Academic Publishers.Google Scholar
  26. Highland, R. W. (1955). A Guide for Use in Performance Testing in Air Force Technical Schools. Armament Systems Personnel Research Laboratory. Colorado: Lowry Air Force Base.Google Scholar
  27. Jolly, B. (1993). Learning Effect of Reusing Stations in an Objective Structured Clinical Examination. Teaching and Learning in Medicine 6(2), 66–71.Google Scholar
  28. Klass, D. J. (1994). High-Stakes Testing of Medical Students Using Standardized Patients. Teaching and Learning in Medicine 6, 23–27.Google Scholar
  29. Klass, D. J. et al. (1994). Progress in Developing a Standardized Patient Test of Clinical Skills at The National Board of Medical Examiners: Prototype Two. Proceedings of The Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 324–326.Google Scholar
  30. Klass, D. J. et al. (in press). Development of a Performance-Based Test of Clinical Skills for the United States Medical Licensing Examination. Proceedings of the 8th Annual Ottawa Conference.Google Scholar
  31. Kolen, M. J. & Brennan, R. L. (1995). Test Equating: Methods and Practices. New York: Springer.Google Scholar
  32. Linn, R. (1993). Linking Results of Distinct Assessments. Applied Measurement in Education 6(1), 83–102.CrossRefGoogle Scholar
  33. Livingston, S. & Lewi, C. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement 32(2), 179–197.CrossRefGoogle Scholar
  34. Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates.Google Scholar
  35. Luecht, R. M. & DeChamplain, A. F. (1998). Applications of Latent Class Analysis to Mastery Decisions Using Complex Performance Assessments. Paper Presentation, Annual Meeting of the American Educational Research Association.Google Scholar
  36. Mislevy, R. (1992). Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. ERIC Document #ED353302.Google Scholar
  37. Niehaus, A. H., DaRosa, D. A., Markwell, S. J. & Folse, R. (1996). Is Test Security a Concern when OSCE Stations Are Repeated across Clerkship Rotations. Academic Medicine 71(October Supplement), S287-S289.CrossRefGoogle Scholar
  38. Norman, G. R., Van der Vleuten, C. P. M. & de Graaff, E. (1991). Pitfalls in the Pursuit of Objectivity: Issues of Validity, Efficiency and Acceptability. Medical Education 25, 119–126.Google Scholar
  39. Reznick, R. K., Smee, S. M., Rothman, A. I., Chalmers, A., Swanson, D. B. & Dufresne, L. et al. (1992). An Objective Structured Clinical Examination for the Licentiate: Report of the Pilot Project of the Medical Council of Canada. Academic Medicine 67, 487–494.CrossRefGoogle Scholar
  40. Reznick, R. K., Blackmore, D. E., Cohen, R., Baumber, J., Rothman, A. I., Smee, S. M., Chalmers, A., Poldre, P., Birdwhistle, R., Walsh, P., Spady, D. & Berard, M. (1993). An Objective Structured Clinical Exam for the Licentiate of the Medical Council of Canada: From Research to Reality. Academic Medicine 68(Suppl.), S4-S6.Google Scholar
  41. Reznick, R. K., Blackmore, D. E., Dauphinee, W. D., Smee, S. M. & Rothman, A. I. (1997). An OSCE for Licensure: The Canadian Experience. In: A. J. J. A. Scherpbier et al. (eds.) Advances in Medical Education. Dordrecht: Kluwer Academic Publisher, 458–461.Google Scholar
  42. Ripkey, D. R., Case, S.M. & Swanson, D. B. (1997). Predicting Performances on the NBME Surgery Subject Test and USMLE Step 2: Effects of Surgery Clerkship Timing and Length. Academic Medicine 72(October Supplement), S31-S33.CrossRefGoogle Scholar
  43. Rothman, A. I., Cohen, R., Dawson-Saunders, E., Poldre, P. P. & Ross, J. (1992). Testing the Equivalence of Multiple Station Tests of Clinical Competence. Academic Medicine 67(October Supplement), S40-S41.Google Scholar
  44. Rutala, R. J. (1991). Sharing of Information by Students in an OSCE. Archives of Internal Medicine 151, 541–544.CrossRefGoogle Scholar
  45. Searle, S. R. (1971). Linear Models. New York: John Wiley and Sons.Google Scholar
  46. Shavelson, R., Webb, N. & Rowley, G. (1989). Generalizability Theory. American Psychologist 44(6), 922–932.CrossRefGoogle Scholar
  47. Skakun, E. N., Cook, D. A. & Morrison, J. C. (1992). Test Security on Sequential OSCE and Multiple-Choice Examinations. In: I. R. Hart, R. M. Harden & J. Des Marchais (eds.) Current Developments in Assessing Clinical Competence. Montreal, Canada: Can-Heal Publications, 711–718.Google Scholar
  48. Stillman, P. L. et al. (1991). Is Test Security an Issue in a Multistation Clinical Assessment? — A Preliminary Study. Academic Medicine 66(October Supplement), S25-S27.Google Scholar
  49. Swanson, D. B. (1987). A Measurement Framework for Performance-Based Tests. In: I. Hart & R. Harden (eds.) Further Developments in Assessing Clinical Competence. Montreal: Can-Heal Publications, Inc, 13–42.Google Scholar
  50. Swanson, D. B. & Norcini, J. J. (1989). Factors Influencing the Reproducibility of Tests Using Standardized Patients. Teaching and Learning in Medicine 1, 158–166.Google Scholar
  51. Swanson, D. B., Norcini, J. J. & Grosso, L. J. (1987). Assessment of Clinical Competence: Written and Computer-Based Simulations. Assessment and Evaluation in Higher Education 12(3), 220–246.Google Scholar
  52. Swanson, D. B., Norman, G. R. & Linn, R. (1995). Performance-Based Assessment: Lessons from the Health Professions. Educational Researcher 24(5), 5–11, 35.CrossRefGoogle Scholar
  53. Swartz, M. H. et al. (1995). The Effect of Deliberate, Excessive Violations of Test Security on a Standardized-Patient Examination: An Extended Analysis. In: Proceedings of The Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 280–284.Google Scholar
  54. Tamblyn, R. M. (1989). The Use of Standardized Patients in the Evaluation of Clinical Competence: The Evaluation of Selected Measurement Properties. Doctoral Thesis, McGill University, Department of Epidemiology, Montreal.Google Scholar
  55. Tamblyn, R. M. et al. (1991a). The Accuracy of Standardized Patient Presentation. Medical Education 25, 100–109.CrossRefGoogle Scholar
  56. Tamblyn, R. M. et al. (1991b). Sources of Unreliability and Bias in Standardized-Patient Rating. Teaching and Learning in Medicine 3, 74–85.CrossRefGoogle Scholar
  57. Van der Linden, W. J. & Hambleton, R. K. (1997). Handbook of Modern Item Response Theory. New York: Springer.Google Scholar
  58. Van der Vleuten, C. P. M. (1996). The Assessment of Professional Competence: Developments, Research, and Practical Implications. Advances in Health Sciences Education 1, 41–67.CrossRefGoogle Scholar
  59. Van der Vleuten, C. P. M. & Swanson, D. B. (1990). Assessment of Clinical Skills with Standardized Patients: State of the Art. Teaching and Learning in Medicine 2, 58–76.Google Scholar
  60. Whelan, G. P. et al. (in press). Educational Commission for Foreign Medical Graduates Clinical Skills Assessment. Proceedings of the 8th Annual Ottawa Conference.Google Scholar
  61. Whelan, G. P. & Moses, V. K. (1990). The Effect on Grades of the Timing and Site of Third-year Internal Medicine Clerkships. Academic Medicine 65(11), 708–709.CrossRefGoogle Scholar
  62. Williams, R. G. et al. (1987). Direct Standardized Assessment of Clinical Competence. Medical Education 21, 482–489Google Scholar
  63. Williams, R. G., Lloyd, J. S. & Simonton, D. K. (1992). Sources of OSCE Examination Information and Perceived Helpfulness: A Study of the Grapevine. In: I. R. Hart, R. M. Harden & J. Des Marchais (eds.) Current Developments in Assessing Clinical Competence. Montreal, Canada: Can-Heal Publications, 363–370.Google Scholar
  64. Woolliscroft, J. O., Swanson, D. B., Case, S. M. & Ripkey, D. R. (1995). Monitoring the Effectiveness of the Clinical Curriculum: Use of a Cross-Clerkship Exam to Assess Development of Diagnostic Skills. In: Rothman AI, Cohen R, eds. Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 476–478.Google Scholar
  65. Wright, B. D. & Masters, G. N. (1982). Rating Scale Analysis. Chicago: MESA Press.Google Scholar
  66. Wright, B. D. & Stone, M. H. (1979). Best Test Design. Chicago: MESA Press.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • David B. Swanson
    • 1
  • Brian E. Clauser
    • 1
  • Susan M. Case
    • 1
  1. 1.National Board of Medical ExaminersPhiladelphiaUSA

Personalised recommendations