What is the validity evidence for assessments of clinical teaching? Authors
Received: 27 July 2005 Revised: 29 July 2005 Accepted: 29 July 2005 DOI:
Cite this article as: Beckman, T.J., Cook, D.A. & Mandrekar, J.N. J GEN INTERN MED (2005) 20: 1159. doi:10.1111/j.1525-1497.2005.0258.x Abstract Although a variety of validity evidence should be utilized when evaluating assessment tools, a review of teaching assessments suggested that authors pursue a limited range of validity evidence. BACKGROUND: To develop a method for rating validity evidence and to quantify the evidence supporting scores from existing clinical teaching assessment instruments. OBJECTIVES: A comprehensive search yielded 22 articles on clinical teaching assessments. Using standards outlined by the American Psychological and Education Research Associations, we developed a method for rating the 5 categories of validity evidence reported in each article. We then quantified the validity evidence by summing the ratings for each category. We also calculated weighted κ coefficients to determine interrater reliabilities for each category of validity evidence. DESIGN: Content and Internal Structure evidence received the highest ratings (27 and 32, respectively, of 44 possible). Relation to Other Variables, Consequences, and Response Process received the lowest ratings (9, 2, and 2, respectively). Interrater reliability was good for Content, Internal Structure, and Relation to Other Variables (κ range 0.52 to 0.96, all MAIN RESULTS: P values <.01), but poor for Consequences and Response Process. Content and Internal Structure evidence is well represented among published assessments of clinical teaching. Evidence for Relation to Other Variables, Consequences, and Response Process receive little attention, and future research should emphasize these categories. The low interrater reliability for Response Process and Consequences likely reflects the scarcity of reported evidence. With further development, our method for rating the validity evidence should prove useful in various settings. CONCLUSIONS: Key Words validity clinical teaching evaluation studies
None of the authors have any conflicts of interest to declare for this paper.
Presented at the Society of General Internal Medicine 28th Annual Meeting, New Orleans, La, May 11–14, 2005. Also presented at the Association for the Study of Medical Education Annual Scientific Meeting, Newcastle upon Tyne, UK, July 11–13, 2005.
Download to read the full article text References
. Validity: on the meaningful interpretation of assessment data. Med Educ. 2003;37:830–7.
Crossley J, Humphris G, Jolly B
. Assessing health professionals. Med Educ. 2002;36:800–4.
Beckman TJ, Ghosh AK, Cook DA, Erwin PJ, Mandrekar JN
. How reliable are assessments of clinical teaching? A review of the published instruments. J Gen Intern Med. 2004;19:971–7.
Beckman TJ, Lee MC, Rohren CH
. Evaluating an instrument for the peer review of inpatient teaching. Med Teach. 2003;25:131–5.
Benbassat J, Bachar E
. Validity of students’ ratings of clinical instructors. Med Educ. 1981;15:373–6.
Cohen R, McRae H, Jamieson C
. Teaching effectiveness of surgeons. Am J Surg. 1996;171:612–4.
Copeland HL, Hewson MG
. Developing and testing an instrument to measure the effectiveness of clinical teaching in an academic medical center. Acad Med. 2000;75:161–6.
Donnelly MB, Woolliscroft JO
. Evaluation of clinical instructors by third year medical students. Acad Med. 1989;64:159–64.
Donner-Banzhoff N, Merle H, Baum E, Basler HD
. Feedback for general practice trainers: developing and testing a standardized instrument using the importance-quality-score method. Med Educ. 2003;37:772–7.
Guyatt GH, Nishikawa J, Willan A, et al. A measurement process for evaluating clinical teachers in internal medicine. Can Med Assoc J. 1993;149:1097–102.
Hayward RA, Williams BC, Gruppen LD, Rosenbaum D
. Measuring attending physician performance in a general medicine outpatient clinic. J Gen Intern Med. 1995;10:504–10.
Irby DM, Rakestraw P
. Evaluating clinical teaching in medicine. J Med Educ. 1981;56:181–6.
James PA, Osborne JW
. A measure of medical instructional quality in ambulatory settings: the MedIQ. Fam Med. 1999;31:263–9.
Litzelman DK, Westmorland GR, Skeff KM, Stratos GA
. Student and resident evaluations of faculty—how reliable are they? Acad Med. 1999;74(suppl):s25–7.
Litzelman DK, Stratos GA, Marriott DJ, Skeff KM
. Factorial validation of a widely disseminated educational framework for evaluating clinical teachers. Acad Med. 1998;73:688–95.
McGill MK, McClure C, Commerford K. A system for evaluating teaching in the ambulatory setting. Fam Med. 1986;18:173–4.
McLeod PJ, James CA, Abrahamowicz M
. Clinical tutor evaluation: a 5-year study by students on an in-patient service and residents in an ambulatory care clinic. Med Educ. 1993;27:48–53.
Ramsbottom-Lucier MT, Gillmore GM, Irby DM, Ramsey PG
. Evaluation of clinical teaching by general internal medicine faculty in outpatient and inpatient settings. Acad Med. 1994;69:152–4.
Risucci DA, Lutsky L, Rosati RJ, Tortolani AJ
. Reliability and accuracy of resident evaluations of surgical faculty. Eval Health Prof. 1992;15:313–24.
Shellenberger S, Mahan JM
. A factor analytic study of teaching in off-campus general practice clerkships. Med Educ. 1982;16:151–5.
Solomon DJ, Speer AJ, Rosebraugh CJ, DiPette DJ
. The reliability of medical student ratings of teaching. Eval Health Prof. 1997;20:343–52.
Steiner IP, Franc-Law J, Kelly KD, Rowe BH
. Faculty evaluation by residents in an emergency medicine program: a new evaluation instrument. Acad Emerg Med. 2000;7:1015–21.
Tortolani AJ, Rissucci DA, Rosati RJ
. Resident evaluation of surgical faculty. J Surg Res. 1991;51:186–91.
Williams BC, Litzelman DK, Babbott SF, Lubitz RM, Hofer TP
. Validation of a global measure of faculty’s clinical teaching performance. Acad Med. 2002;77:177–80.
Smith CA, Varkey AB, Evans AT, Reilly BM
. Evaluating the performance of inpatient attending physicians: a new instrument for today’s teaching hospitals. J Gen Intern Med. 2004;19:766–71.
American Education Research Association and American Psychological Association. Standards for Educational and Psychological Testing. Washington, DC: American Education Research Association; 1999.
Messick S Validity. In: Linn RL, ed. Educational Measurement. 3rd ed. Phoenix, Ariz: Oryx Press; 1993.
Fleiss JL, Cohen J
. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas. 1973;33:613–9.
Fleiss JL, Cohen J, Everitt BS
. Large sample standard errors of kappa and weighted kappa. Psychol Bull. 1969;72:323–7.
Landis JR, Koch GG
. The measure of observer agreement for categorical data. Biometrics. 1977;33:159–74.
. Reliability: on the reproducibility of assessment data. Med Educ. 2004;38:1006–12.
Carney PA, Neirenberg DW, Pipas CF, Brooks WB, Stukel TA, Keller AM
. Educational epidemiology: applying population-based design and analytic approaches to study medical education. JAMA. 2004;292:1044–50.
Beckman TJ, Cook DA
. Educational epidemiology (letter). JAMA. 2004;292:1969.
CrossRef Copyright information
© Society of General Internal Medicine 2005