BACKGROUND: Although a variety of validity evidence should be utilized when evaluating assessment tools, a review of teaching assessments suggested that authors pursue a limited range of validity evidence.
OBJECTIVES: To develop a method for rating validity evidence and to quantify the evidence supporting scores from existing clinical teaching assessment instruments.
DESIGN: A comprehensive search yielded 22 articles on clinical teaching assessments. Using standards outlined by the American Psychological and Education Research Associations, we developed a method for rating the 5 categories of validity evidence reported in each article. We then quantified the validity evidence by summing the ratings for each category. We also calculated weighted κ coefficients to determine interrater reliabilities for each category of validity evidence.
MAIN RESULTS: Content and Internal Structure evidence received the highest ratings (27 and 32, respectively, of 44 possible). Relation to Other Variables, Consequences, and Response Process received the lowest ratings (9, 2, and 2, respectively). Interrater reliability was good for Content, Internal Structure, and Relation to Other Variables (κ range 0.52 to 0.96, all P values <.01), but poor for Consequences and Response Process.
CONCLUSIONS: Content and Internal Structure evidence is well represented among published assessments of clinical teaching. Evidence for Relation to Other Variables, Consequences, and Response Process receive little attention, and future research should emphasize these categories. The low interrater reliability for Response Process and Consequences likely reflects the scarcity of reported evidence. With further development, our method for rating the validity evidence should prove useful in various settings.
Beckman TJ, Ghosh AK, Cook DA, Erwin PJ, Mandrekar JN. How reliable are assessments of clinical teaching? A review of the published instruments. J Gen Intern Med. 2004;19:971–7.PubMedCrossRefGoogle Scholar
Beckman TJ, Lee MC, Rohren CH. Evaluating an instrument for the peer review of inpatient teaching. Med Teach. 2003;25:131–5.PubMedCrossRefGoogle Scholar
Benbassat J, Bachar E. Validity of students’ ratings of clinical instructors. Med Educ. 1981;15:373–6.PubMedGoogle Scholar
Copeland HL, Hewson MG. Developing and testing an instrument to measure the effectiveness of clinical teaching in an academic medical center. Acad Med. 2000;75:161–6.PubMedCrossRefGoogle Scholar
Donnelly MB, Woolliscroft JO. Evaluation of clinical instructors by third year medical students. Acad Med. 1989;64:159–64.PubMedCrossRefGoogle Scholar
Donner-Banzhoff N, Merle H, Baum E, Basler HD. Feedback for general practice trainers: developing and testing a standardized instrument using the importance-quality-score method. Med Educ. 2003;37:772–7.PubMedCrossRefGoogle Scholar
Guyatt GH, Nishikawa J, Willan A, et al. A measurement process for evaluating clinical teachers in internal medicine. Can Med Assoc J. 1993;149:1097–102.Google Scholar
Hayward RA, Williams BC, Gruppen LD, Rosenbaum D. Measuring attending physician performance in a general medicine outpatient clinic. J Gen Intern Med. 1995;10:504–10.PubMedCrossRefGoogle Scholar
Irby DM, Rakestraw P. Evaluating clinical teaching in medicine. J Med Educ. 1981;56:181–6.PubMedGoogle Scholar
James PA, Osborne JW. A measure of medical instructional quality in ambulatory settings: the MedIQ. Fam Med. 1999;31:263–9.PubMedGoogle Scholar
Litzelman DK, Westmorland GR, Skeff KM, Stratos GA. Student and resident evaluations of faculty—how reliable are they? Acad Med. 1999;74(suppl):s25–7.PubMedCrossRefGoogle Scholar
Litzelman DK, Stratos GA, Marriott DJ, Skeff KM. Factorial validation of a widely disseminated educational framework for evaluating clinical teachers. Acad Med. 1998;73:688–95.PubMedCrossRefGoogle Scholar
McGill MK, McClure C, Commerford K. A system for evaluating teaching in the ambulatory setting. Fam Med. 1986;18:173–4.Google Scholar
McLeod PJ, James CA, Abrahamowicz M. Clinical tutor evaluation: a 5-year study by students on an in-patient service and residents in an ambulatory care clinic. Med Educ. 1993;27:48–53.PubMedCrossRefGoogle Scholar
Ramsbottom-Lucier MT, Gillmore GM, Irby DM, Ramsey PG. Evaluation of clinical teaching by general internal medicine faculty in outpatient and inpatient settings. Acad Med. 1994;69:152–4.PubMedCrossRefGoogle Scholar
Risucci DA, Lutsky L, Rosati RJ, Tortolani AJ. Reliability and accuracy of resident evaluations of surgical faculty. Eval Health Prof. 1992;15:313–24.PubMedCrossRefGoogle Scholar
Shellenberger S, Mahan JM. A factor analytic study of teaching in off-campus general practice clerkships. Med Educ. 1982;16:151–5.PubMedGoogle Scholar
Solomon DJ, Speer AJ, Rosebraugh CJ, DiPette DJ. The reliability of medical student ratings of teaching. Eval Health Prof. 1997;20:343–52.PubMedCrossRefGoogle Scholar
Steiner IP, Franc-Law J, Kelly KD, Rowe BH. Faculty evaluation by residents in an emergency medicine program: a new evaluation instrument. Acad Emerg Med. 2000;7:1015–21.PubMedGoogle Scholar
Williams BC, Litzelman DK, Babbott SF, Lubitz RM, Hofer TP. Validation of a global measure of faculty’s clinical teaching performance. Acad Med. 2002;77:177–80.PubMedCrossRefGoogle Scholar
Smith CA, Varkey AB, Evans AT, Reilly BM. Evaluating the performance of inpatient attending physicians: a new instrument for today’s teaching hospitals. J Gen Intern Med. 2004;19:766–71.PubMedCrossRefGoogle Scholar
American Education Research Association and American Psychological Association. Standards for Educational and Psychological Testing. Washington, DC: American Education Research Association; 1999.Google Scholar
Messick S Validity. In: Linn RL, ed. Educational Measurement. 3rd ed. Phoenix, Ariz: Oryx Press; 1993.Google Scholar
Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas. 1973;33:613–9.CrossRefGoogle Scholar
Fleiss JL, Cohen J, Everitt BS. Large sample standard errors of kappa and weighted kappa. Psychol Bull. 1969;72:323–7.CrossRefGoogle Scholar