Advances in Health Sciences Education

, Volume 8, Issue 1, pp 27–47 | Cite as

Quality Assurance Methods for Performance-Based Assessments

  • John R. BouletEmail author
  • Danette W. McKinley
  • Gerald P. Whelan
  • Ronald K. Hambleton


Performance assessments are subject to many potential error sources. For performance-based assessments, including standardized patient (SP) examinations, these error sources, if left unchecked, can compromise the validity and reliability of scores. Quality assurance (QA) measures, both quantitative and qualitative, can be used to ensure that candidate scores are accurate and reasonably free from measurement error. The purpose of this paper is to outline several QA strategies that can be used to identify potential content- and score-related problems with SP assessments. These approaches include case analyses and various comparisons of primary and observer scores. Specific examples from the ECFMG Clinical Skills Assessment(CSA®) are used to educate the reader concerning appropriate statistical methods and legitimate data interpretations. The results presented in this investigation highlight the need for well-defined training regimes, regular feedback to those involved in rating/scoring performances, and detailed statistical analyses of all scores.

clinical skills performance assessment psychometrics quality assurance standardized patients 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bollen, K.A. (1989). Structural Equations with Latent Variables. New York: John Wiley & Sons.Google Scholar
  2. Boulet, J.R., Friedman Ben-David, M. et al. (1998a). Using standardized patients to assess the interpersonal skills of physicians. Academic Medicine 73: S94–S96.PubMedGoogle Scholar
  3. Boulet, J.R., Friedman Ben-David, M. et al. (1998b). An investigation of the sources of measurement error in the post-encounter written scores from standardized patient examinations. Advances in Health Sciences Education 3: 89–100.CrossRefPubMedGoogle Scholar
  4. Boulet, J.R., Friedman Ben-David, M. et al. (2000). The use of holistic scoring for post-encounter written exercises. In D. Melnick (ed.), Proceedings of the Eighth Ottawa Conference on Medical Education and Assessment, pp. 254–260. Philadelphia: National Board of Medical Examiners.Google Scholar
  5. Brennan, R.L. & Johnson, E.G. (1995). Generalizability of performance assessments. Educational Measurement: Issues and Practice Winter: 9–12.Google Scholar
  6. Carraccio, C. & Englander, R. (2000). The objective structured clinical examination: A step in the direction of competency-based evaluation. Archives of Pediatric Adolescent Medicine 154: 736–741.Google Scholar
  7. Chambers, K.A., Boulet, J.R. & Gary, N.E. (2000). The management of patient encounter time in a high-stakes assessment using standardized patients. Medical Education 34: 813–817.CrossRefPubMedGoogle Scholar
  8. Clauser, B.E., Swanson, D.B. & Clyman, S.G. (1996). The generalizability of scores from a performance assessment of physicians' patient management skills. Academic Medicine 71: S109–S111.PubMedGoogle Scholar
  9. Cooper-Patrick, L., Gallo, J.J. et al. (1999). Race, gender, and partnership in the patient-physician relationship. Journal of the American Medical Association 282: 583–589.CrossRefPubMedGoogle Scholar
  10. Dauphinee, D. & Norcini, J.J. (1999). Assessing health care professionals in the new millenium. Advances in Health Sciences Education 4: 3–7.CrossRefPubMedGoogle Scholar
  11. De Champlain, A.F., Margolis, M.J. et al. (1997). Standardized patients' accuracy in recording examinees' behaviors using checklists. Academic Medicine 72: S85–S87.PubMedGoogle Scholar
  12. Downing, S.M. & Haladyna, T.M. (1997). Test item development: Validity evidence from quality assurance procedures. Applied Measurement in Education 10: 61–82.Google Scholar
  13. ECFMG (1999). Clinical Skills Assessment (CSA) Candidate Orientation Manual. Philadelphia, Pennsylvania: Educational Commission for Foreign Medical Graduates (ECFMG).Google Scholar
  14. Friedman Ben-David, M., Boulet, J.R. et al. (1997). Issues of validity and reliability concerning who should score the post-encounter patient-progress note. Academic Medicine 72: S79–S81.PubMedGoogle Scholar
  15. Grand'Maison, P., Brailovsky, C.A. et al. (1997). Using standardized patients in licensing / certification examinations: Comparison of two tests in Canada. Family Medicine 29: 27–32.PubMedGoogle Scholar
  16. Hodges, B., Turnbull, J. et al. (1995). Assessment of communication skills with complex cases using OSCE format. In A.I. Rothman & R. Cohen (eds.), Proceedings of the Sixth Ottawa Conference on Medical Education, pp. 269–272. Toronto: University of Toronto Bookstore.Google Scholar
  17. Hodges, B., Regehr, G. et al. (1999). OSCE checklists do not capture increasing levels of expertise. Academic Medicine 74: 1129–1134.PubMedGoogle Scholar
  18. Klass, D.J. (1994). “High-stakes” testing of medical students using standardized patients. Teaching and Learning in Medicine 6: 28–32.Google Scholar
  19. Kline, R.B. (1998). Principles and Practice of Structural Equation Modeling. New York: The Guilford Press.Google Scholar
  20. Pangaro, L.N., Worth-Dickstein, H. et al. (1997). Performance of “standardized examinees” in a standardized-patient examination of clinical skills. Academic Medicine 72: 1008–1011.PubMedGoogle Scholar
  21. Reznick, R., Blackmore, D. et al. (1996). Large-scale high-stakes testing with an OSCE: Report from the Medical College of Canada. Academic Medicine 71: S19–S21.Google Scholar
  22. Rutala, P.J., Witzke, D.B. et al. (1990). Student fatigue as a variable affecting performance in an objective structured clinical examination. Academic Medicine 65: S53–S54.PubMedGoogle Scholar
  23. Searle, S.R., Speed, F.M. & Milliken, G.A. (1980). Population marginal means in the linear model: An alternative to least squares means. The American Statistician 34: 216–221.Google Scholar
  24. Sinacore, J.M., Connell, K.J. et al. (2000). A method for measuring interrater agreement on checklists. Evaluation & the Health Professions 22: 221–234.Google Scholar
  25. Swanson, D.B., Clauser, B.E. & Case, S.M. (1999). Clinical skills assessment with standardized patients in high-stakes tests: A framework for thinking about score precision, equating, and security. Advances in Health Sciences Education 4: 67–106.CrossRefPubMedGoogle Scholar
  26. Swanson, D.B., Norman, G.R. & Linn, R.L. (1995). Performance-based assessment: Lessons from the health professions. Educational Researcher 24: 5–11.Google Scholar
  27. Tamblyn, R.M., Klass, D.J. et al. (1991). Sources of unreliability and bias in standardized-patient rating. Teaching and Learning in Medicine 3: 74–85.Google Scholar
  28. van der Vleuten, C., Norman, G.R. & De Graaff, E. (1991). Pitfalls in the pursuit of objectivity: Issues of reliability. Medical Education 25: 110–118.PubMedGoogle Scholar
  29. Vu, N.V. & Barrows, H.S. (1994). Use of standardized patients in clinical assessments: recent developments and measurement findings. Educational Researcher 23: 23–30.Google Scholar
  30. Wallace, P., Garman, K. et al. (1999). Effect of varying amounts of feedback on standardized patient checklist accuracy in clinical practice examinations. Teaching and Learning in Medicine 11: 148–152.CrossRefGoogle Scholar
  31. Wang, Y., Stillman, P.L. et al. (1996). The effect of fatigue on the accuracy of standardized patients' checklist recording. Teaching & Learning in Medicine 8: 148–151.Google Scholar
  32. Whelan, G.P. (1999). Educational Commission for Foreign Medical Graduates: Clinical skills assessment prototype. Medical Teacher 21: 156–160.CrossRefGoogle Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • John R. Boulet
    • 1
    Email author
  • Danette W. McKinley
    • 1
  • Gerald P. Whelan
    • 1
  • Ronald K. Hambleton
    • 2
  1. 1.Research and EvaluationEducational Commission for Foreign Medical Graduates (ECFMG)PhiladelphiaUSA
  2. 2.University of MassachusettsUSA

Personalised recommendations