OBJECTIVE: Existing systems of in-training evaluation (ITE) have been criticized as being unreliable and invalid methods for assessing student performance during clinical education. The purpose of this study was to assess the feasibility, reliability, and validity of a clinical work sampling (CWS) approach to ITE. This approach focused on the following: (1) basing performance data on observed behaviors, (2) using multiple observers and occasions, (3) recording data at the time of performance, and (4) allowing for a feasible system to receive feedback.
PARTICIPANTS: Sixty-two third-year University of Ottawa students were assessed during their 8-week internal medicine inpatient experience.
MEASUREMENTS AND MAIN RESULTS: Four performance rating forms (Admission Rating Form, Ward Rating Form, Multidisciplinary Team Rating Form, and Patient’s Rating Form) were introduced to document student performance. Voluntary participation rates were variable (12%–64%) with patients excluded from the analysis because of low response rate (12%). The mean number of evaluations per student per rotation (19) exceeded the number of evaluations needed to achieve sufficient reliability. Reliability coefficients were high for the Ward Form (.86) and the Admission Form (.73) but not for the Multidisciplinary Team (.22) Form. There was an examiner effect (rater leniency), but this was small relative to real differences between students. Correlations between the Ward Form and the Admission Form were high (.47), while those with the Multidisciplinary Team Form were lower (.37 and .26, respectively). The CWS approach ITE was considered to be content valid by expert judges.
CONCLUSIONS: The collection of ongoing performance data was reasonably feasible, reliable, and valid.
Irby DM, Milam, S. The legal context for evaluating and dismissing medical students and residents. Acad Med. 1989;64:639–43.PubMedCrossRefGoogle Scholar
Stone AA, Shiffman S. Ecological momentary assessment (EMA) in behavioral medicine. Ann Beh Med. 1994;16:199–202.Google Scholar
Fleiss J, Shrout, PE. Approximate interval estimation for a certain inter-class correlation coefficient. Psychometrika. 1978;43:259–62.CrossRefGoogle Scholar
Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to their Development and Use. Oxford, U.K.: Oxford University Press; 1995.Google Scholar
Ramsey PG, Carline JD, Blank LL, Wenrich MD. Feasibility of hospital-based use of peer ratings to evaluate the performances of practicing physicians. Acad Med. 1996;71:364–70.PubMedCrossRefGoogle Scholar
Ramsey PG, Wenrich MD, Carline JD, Inui TS, Larson EB, Logerfo JP. Use of peer ratings to evaluate physician performance. JAMA. 1993;269:1655–60.PubMedCrossRefGoogle Scholar
Butterfield PS, Mazzaferri EL. New rating form for use by nurses in assessing residents’ humanistic behavior. J Gen Intern Med. 1991;6:155–61.PubMedCrossRefGoogle Scholar
Societal Needs Working Group. CanMEDS 2000 Project. Skills for the new millennium. Ann RCPSC. 1996;29:206–16.Google Scholar