References
Angoff, W. H. (1971). Scales, Norms, and Equivalent Scores. In: R. L. Thorndike (ed.) Educational Measurement (2nd edn.). Washington, DC: American Council on Education, 508–600.
Association of American Medical Colleges (1998). Emerging Trends in the Use of Standardized Patients. Contemporary Issues in Medical Education 1(7), 1–2.
Battles, J. B., Carpenter, J. L., McIntire, D. & Wagner, J. M. (1994). Analyzing and Adjusting for Variables in a Large-Scale Standardized-Patient Examination. Academic Medicine 69(5), 370–376.
Brennan, R. L. (1992). Elements of Generalizability Theory (rev. ed.). Iowa City, IA: American College Testing Program.
Brennan, R. L. (1995). Generalizability of Performance Assessments. Educational Measurement: Issues and Practice 14(4), 9–12, 27.
Case, S. M., Templeton, B., Samph, T. & Best, A. M. III (1992). Comparison of Observation-Based and Chart-Based Scores Derived from Standardized Patient Encounters. In: R. Harden, I. Hart & H. Mulholland (eds.) Approaches to Assessment of Clinical Competence. Norwich, England: Page Brothers, 471–475.
Clauser, B. (1998). Equating Performance Assessments with the Rasch Rating-Scale Model Using Internal and External Links. Paper Presentation, Annual Meeting of the American Educational Research Association.
Cohen, R. et al. (1993). Impact of Repeated Use of Objective Structured Clinical Examination Stations. Academic Medicine (October Supplement), S73-S75.
Colliver, J. A. et al. (1989). Reliability of Performance on Standardized-Patient Cases: A Comparison of Consistency Measures Based on Generalizability Therory. Teaching and Learning in Medicine 1(1), 31–37.
Colliver, J. A. et al. (1990). Three Studies of the Effect of Multiple Standardized-Patients on Intercase Reliability of Five Standardized-Patient Examinations. Teaching and Learning in Medicine 2(4), 237–245.
Colliver, J. A. et al. (1991a). Effects of Using Two or More Standardized-Patients to Simulate the Same Case Means and Case Failure Rates. Academic Medicine 66(10), 616–618.
Colliver, J. A. et al. (1991b). Test Security in Examinations That Use Standardized-Patient Cases at One Medical School. Academic Medicine 66(5), 279–282.
Colliver, J. A. et al. (1991c). Test Security in Examinations Using Standardized-Patient Cases for Five Classes of Senior Medical Students. Academic Medicine 66, 279–282.
Colliver, J. A. et al. (1994). Effect of Using Multiple Standardized Patients to Rate Interpersonal and Communication Skills on Intercase Reliability. Teaching and Learning in Medicine 6(1), 45–48.
Colliver, J. A. et al. (1998). The Effect of Using Multiple Standardized Patients on the Inter-Case Reliability of a Large-Scale Standardized-Patient Examination Administered over an Extended Testing Period. Academic Medicine 73(October Supplement), S81-S83.
Crick, J. E. & Brennan, R. L. (1983). The Manual for Genova. Iowa City, Iowa: American College Testing Program.
Cronbach, L. J., Gleser, G. C., Nanda, H. H. & Rajaratnam, N. (1972). Dependability of Behavioral Measurements: Theory of Generalizability for Scores and Profiles. New York: John Wiley and Sons Inc.
DeChamplain, A. F. et al. (1997). Standardized Patients' Accuracy in Recording Examinees' Behaviors Using Checklists. Academic Medicine 72(October Supplement), S85-S87.
DeChamplain, A. F. et al. (in press). Do Standardized Patients' Recording Discrepancies Impact upon Case and Examination Mastery-Level Decisions? Academic Medicine.
DeChamplain, A. F. et al. (under editorial review). Modeling the Effects of a Security Breach and Test Preparation on a Large-Scale Performance-Based Assessment.
Fitzpatrick, R. & Morrison, E. J. (1971). Performance and Product Evaluation. In: R. L. Thorndike (ed.) Educational Measurement. Washington, DC: American Council on Education, 237–270.
Furman, G. E. et al. (1997). The Effect of Formal Feedback Sessions on Test Security for a Clinical Practice Examination Using Standardized Patients. In: A. J. J. A. Sherpbier, C. P. M. Van der Vleuten, J. J. Rethans & A. F. W. Van der Steeg (eds.) Advances in Medical Education. Dordrecht, The Netherlands: Kluwer Academic Publishers, 433–436.
Gessaroli, M. E., Swanson, D. B. & DeChamplain, A. F. (1998). Equating Performance Assessments Using Structural Equation Models. Paper Presentation, Annual Meeting of the American Educational Research Association.
Grand-Maision, P. et al. (1992). Large Scale Use of an Objective Structured Clinical Examination for Licensing Family Physicians. Can Med Assoc J 146(10), 1735–1740.
Hambleton, R. K. & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston: Kluwer Academic Publishers.
Highland, R. W. (1955). A Guide for Use in Performance Testing in Air Force Technical Schools. Armament Systems Personnel Research Laboratory. Colorado: Lowry Air Force Base.
Jolly, B. (1993). Learning Effect of Reusing Stations in an Objective Structured Clinical Examination. Teaching and Learning in Medicine 6(2), 66–71.
Klass, D. J. (1994). High-Stakes Testing of Medical Students Using Standardized Patients. Teaching and Learning in Medicine 6, 23–27.
Klass, D. J. et al. (1994). Progress in Developing a Standardized Patient Test of Clinical Skills at The National Board of Medical Examiners: Prototype Two. Proceedings of The Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 324–326.
Klass, D. J. et al. (in press). Development of a Performance-Based Test of Clinical Skills for the United States Medical Licensing Examination. Proceedings of the 8th Annual Ottawa Conference.
Kolen, M. J. & Brennan, R. L. (1995). Test Equating: Methods and Practices. New York: Springer.
Linn, R. (1993). Linking Results of Distinct Assessments. Applied Measurement in Education 6(1), 83–102.
Livingston, S. & Lewi, C. (1995). Estimating the Consistency and Accuracy of Classifications Based on Test Scores. Journal of Educational Measurement 32(2), 179–197.
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, New Jersey: Lawrence Erlbaum Associates.
Luecht, R. M. & DeChamplain, A. F. (1998). Applications of Latent Class Analysis to Mastery Decisions Using Complex Performance Assessments. Paper Presentation, Annual Meeting of the American Educational Research Association.
Mislevy, R. (1992). Linking Educational Assessments: Concepts, Issues, Methods, and Prospects. ERIC Document #ED353302.
Niehaus, A. H., DaRosa, D. A., Markwell, S. J. & Folse, R. (1996). Is Test Security a Concern when OSCE Stations Are Repeated across Clerkship Rotations. Academic Medicine 71(October Supplement), S287-S289.
Norman, G. R., Van der Vleuten, C. P. M. & de Graaff, E. (1991). Pitfalls in the Pursuit of Objectivity: Issues of Validity, Efficiency and Acceptability. Medical Education 25, 119–126.
Reznick, R. K., Smee, S. M., Rothman, A. I., Chalmers, A., Swanson, D. B. & Dufresne, L. et al. (1992). An Objective Structured Clinical Examination for the Licentiate: Report of the Pilot Project of the Medical Council of Canada. Academic Medicine 67, 487–494.
Reznick, R. K., Blackmore, D. E., Cohen, R., Baumber, J., Rothman, A. I., Smee, S. M., Chalmers, A., Poldre, P., Birdwhistle, R., Walsh, P., Spady, D. & Berard, M. (1993). An Objective Structured Clinical Exam for the Licentiate of the Medical Council of Canada: From Research to Reality. Academic Medicine 68(Suppl.), S4-S6.
Reznick, R. K., Blackmore, D. E., Dauphinee, W. D., Smee, S. M. & Rothman, A. I. (1997). An OSCE for Licensure: The Canadian Experience. In: A. J. J. A. Scherpbier et al. (eds.) Advances in Medical Education. Dordrecht: Kluwer Academic Publisher, 458–461.
Ripkey, D. R., Case, S.M. & Swanson, D. B. (1997). Predicting Performances on the NBME Surgery Subject Test and USMLE Step 2: Effects of Surgery Clerkship Timing and Length. Academic Medicine 72(October Supplement), S31-S33.
Rothman, A. I., Cohen, R., Dawson-Saunders, E., Poldre, P. P. & Ross, J. (1992). Testing the Equivalence of Multiple Station Tests of Clinical Competence. Academic Medicine 67(October Supplement), S40-S41.
Rutala, R. J. (1991). Sharing of Information by Students in an OSCE. Archives of Internal Medicine 151, 541–544.
Searle, S. R. (1971). Linear Models. New York: John Wiley and Sons.
Shavelson, R., Webb, N. & Rowley, G. (1989). Generalizability Theory. American Psychologist 44(6), 922–932.
Skakun, E. N., Cook, D. A. & Morrison, J. C. (1992). Test Security on Sequential OSCE and Multiple-Choice Examinations. In: I. R. Hart, R. M. Harden & J. Des Marchais (eds.) Current Developments in Assessing Clinical Competence. Montreal, Canada: Can-Heal Publications, 711–718.
Stillman, P. L. et al. (1991). Is Test Security an Issue in a Multistation Clinical Assessment? — A Preliminary Study. Academic Medicine 66(October Supplement), S25-S27.
Swanson, D. B. (1987). A Measurement Framework for Performance-Based Tests. In: I. Hart & R. Harden (eds.) Further Developments in Assessing Clinical Competence. Montreal: Can-Heal Publications, Inc, 13–42.
Swanson, D. B. & Norcini, J. J. (1989). Factors Influencing the Reproducibility of Tests Using Standardized Patients. Teaching and Learning in Medicine 1, 158–166.
Swanson, D. B., Norcini, J. J. & Grosso, L. J. (1987). Assessment of Clinical Competence: Written and Computer-Based Simulations. Assessment and Evaluation in Higher Education 12(3), 220–246.
Swanson, D. B., Norman, G. R. & Linn, R. (1995). Performance-Based Assessment: Lessons from the Health Professions. Educational Researcher 24(5), 5–11, 35.
Swartz, M. H. et al. (1995). The Effect of Deliberate, Excessive Violations of Test Security on a Standardized-Patient Examination: An Extended Analysis. In: Proceedings of The Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 280–284.
Tamblyn, R. M. (1989). The Use of Standardized Patients in the Evaluation of Clinical Competence: The Evaluation of Selected Measurement Properties. Doctoral Thesis, McGill University, Department of Epidemiology, Montreal.
Tamblyn, R. M. et al. (1991a). The Accuracy of Standardized Patient Presentation. Medical Education 25, 100–109.
Tamblyn, R. M. et al. (1991b). Sources of Unreliability and Bias in Standardized-Patient Rating. Teaching and Learning in Medicine 3, 74–85.
Van der Linden, W. J. & Hambleton, R. K. (1997). Handbook of Modern Item Response Theory. New York: Springer.
Van der Vleuten, C. P. M. (1996). The Assessment of Professional Competence: Developments, Research, and Practical Implications. Advances in Health Sciences Education 1, 41–67.
Van der Vleuten, C. P. M. & Swanson, D. B. (1990). Assessment of Clinical Skills with Standardized Patients: State of the Art. Teaching and Learning in Medicine 2, 58–76.
Whelan, G. P. et al. (in press). Educational Commission for Foreign Medical Graduates Clinical Skills Assessment. Proceedings of the 8th Annual Ottawa Conference.
Whelan, G. P. & Moses, V. K. (1990). The Effect on Grades of the Timing and Site of Third-year Internal Medicine Clerkships. Academic Medicine 65(11), 708–709.
Williams, R. G. et al. (1987). Direct Standardized Assessment of Clinical Competence. Medical Education 21, 482–489
Williams, R. G., Lloyd, J. S. & Simonton, D. K. (1992). Sources of OSCE Examination Information and Perceived Helpfulness: A Study of the Grapevine. In: I. R. Hart, R. M. Harden & J. Des Marchais (eds.) Current Developments in Assessing Clinical Competence. Montreal, Canada: Can-Heal Publications, 363–370.
Woolliscroft, J. O., Swanson, D. B., Case, S. M. & Ripkey, D. R. (1995). Monitoring the Effectiveness of the Clinical Curriculum: Use of a Cross-Clerkship Exam to Assess Development of Diagnostic Skills. In: Rothman AI, Cohen R, eds. Proceedings of the Sixth Ottawa Conference on Medical Education. Toronto, Canada: University of Toronto Bookstore Custom Publishing, 476–478.
Wright, B. D. & Masters, G. N. (1982). Rating Scale Analysis. Chicago: MESA Press.
Wright, B. D. & Stone, M. H. (1979). Best Test Design. Chicago: MESA Press.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Swanson, D.B., Clauser, B.E. & Case, S.M. Clinical Skills Assessment with Standardized Patients in High-Stakes Tests: A Framework for Thinking about Score Precision, Equating, and Security. Adv Health Sci Educ Theory Pract 4, 67–106 (1999). https://doi.org/10.1023/A:1009862220473
Issue Date:
DOI: https://doi.org/10.1023/A:1009862220473