Journal of General Internal Medicine

, Volume 7, Issue 2, pp 174–179 | Cite as

The inter-rater reliability and internal consistency of a clinical evaluation exercise

  • Frank J. Kroboth
  • Barbara H. Hanusa
  • Susan Parker
  • John L. Coulehan
  • Wishwa N. Kapoor
  • Frank H. Brown
  • Michael Karpf
  • Gerald S. Levey
Original Articles


Objective:To assess the internal consistency and interrater reliability of a clinical evaluation exercise (CEX) format that was designed to be easily utilized, but sufficiently detailed, to achieve uniform recording of the observed examination.

Design:A comparison of 128 CEXs conducted for 32 internal medicine interns by full-time faculty. This paper reports alpha coefficients as measures of internal consistency and several measures of inter-rater reliability.

Setting:A university internal medicine program. Observations were conducted at the end of the internship year.

Participants:Participants were 32 interns and observers were 12 full-time faculty in the department of medicine. The entire intern group was chosen in order to optimize the spectrum of abilities represented. Patients used for the study were recruited by the chief resident from the inpatient medical service based on their ability and willingness to participate.

Intervention:Each intern was observed twice and there were two examiners during each CEX. The examiners were given a standardized preparation and used a format developed over five years of previous pilot studies.

Measurements and main results:The format appeared to have excellent internal consistency; alpha coefficients ranged from 0.79 to 0.99. However, multiple methods of determining inter-rater reliability yielded similar results; intraclass correlations ranged from 0.23 to 0.50 and generalizability coefficients from a low of 0.00 for the overall rating of the CEX to a high of 0.61 for the physical examination section. Transforming scores to eliminate rater effects and dichotomizing results into pass-fail did not appear to enhance the reliability results.

Conclusions:Although the CEX is a valuable didactic tool, its psychometric properties preclude reliable assessment of clinical skills as a one-time observation.

Key words

clinical evaluation exercise inter-rater reliability education performance assessment 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blank LL, Grosso LJ, Benson JA Jr. A survey of clinical skills evaluation practices in internal medicine residency programs. J Med Educ. 1984;59:401–6.PubMedGoogle Scholar
  2. 2.
    Petersdorf RG, Beck JC. The new procedure for evaluating the clinical competence of candidates to be certified by the American Board of Internal Medicine. Ann Intern Med. 1972;76:491–6.PubMedGoogle Scholar
  3. 3.
    Woolliscroft JO, Stross JK, Silva J Jr. Clinical competence certification: a critical appraisal. J Med Educ. 1984;59:799–805.PubMedGoogle Scholar
  4. 4.
    Herbers JE Jr, Noel GL, Cooper GS, Harvey J, Pangaro LN, Weaver MJ. How accurate are faculty evaluations of clinical competence? J Gen Intern Med. 1989;4:202–8.PubMedGoogle Scholar
  5. 5.
    Kroboth FJ, Kapoor W, Brown FH, Karpf M, Levey GS. A comparative trial of the clinical evaluation exercise. Arch Intern Med. 1985;145:1121–3.PubMedCrossRefGoogle Scholar
  6. 6.
    Lipkin M. The medical interview and related skills. In: Branch W. Office practice of medicine, 2nd ed. Philadelphia: W. B. Saunders, 1987;1287–306.Google Scholar
  7. 7.
    Brennan RL, Kane MT. Generalizability theory: a review. In: Traub RE (ed.). New directions for testing and measurement (no. 4): methodological developments. San Francisco: Jossey-Bass, 1979;33–51.Google Scholar
  8. 8.
    Shrout PE, Fleiss JL. Intraclass correlations in assessing rater reliability. Psychol Bull. 1979;86:420–8.CrossRefPubMedGoogle Scholar
  9. 9.
    Cohen J. Weighted kappa: nominal scale agreement with provisions for scaled disagreement or partial credit. Psychol Bull. 1968;70:213–20.CrossRefGoogle Scholar
  10. 10.
    Cohen J. A co-efficient of agreement for nominal scales. Educ Psychol Measurement. 1960;20:37–46.CrossRefGoogle Scholar
  11. 11.
    Hinz CF. Direct observation as a means of teaching and evaluating clinical skills. J Med Educ. 1966;41:150–61.PubMedGoogle Scholar
  12. 12.
    Landy FJ, Farr JL. Performance rating. Psychol Bull. 1980;1:72–107.CrossRefGoogle Scholar
  13. 13.
    Thompson WG, Lipkin M Jr, Gilbert DA, Guzzo RA, Roberson L. Evaluating evaluation: assessment of the American Board of Internal Medicine resident evaluation form. J Gen Intern Med. 1990;5:214–7.PubMedGoogle Scholar
  14. 14.
    Benson JA Jr, Blank LL, Norcini JJ Jr. Examining the ABIM’s evaluation form [letter]. J Gen Intern Med. 1990;5:535–6.PubMedGoogle Scholar
  15. 15.
    Barrows HS, Abrahamson S. The programmed patient: a technique for appraising student performance in clinical neurology. J Med Educ. 1964;39:802–5.PubMedGoogle Scholar
  16. 16.
    Owen A, Winkler R. General practitioners and psychosocial problems: an evaluation using pseudopatients. Med J Aust. 1974;2:393–98.PubMedGoogle Scholar
  17. 17.
    Godkins TR, Duffy D, Greenwood J, Stanhope WD. Utilization of simulated patients to teach the ‘routine’ pelvic examination. J Med Educ. 1974;49:1174–8.PubMedGoogle Scholar
  18. 18.
    Anderson KK, Meyer TC. The use of instructor-patients to teach physical examination techniques. J Med Educ. 1978;53:831–6.PubMedGoogle Scholar
  19. 19.
    Elliot DL, Hickman DH. Evaluation of physical examination skills: reliability of faculty observers and patient instructors. JAMA. 1987;258:3405–8.PubMedCrossRefGoogle Scholar
  20. 20.
    Stillman PL, Swanson PD, Snee S, et al. Assessing clinical skills of residents with standardized patients. Ann Intern Med. 1986;105:762–71.PubMedGoogle Scholar
  21. 21.
    Stillman P, Swanson D, Regan MB, et al. Assessment of clinical skills of residents utilizing standardized patients. Ann Intern Med. 1991;114:393–401.PubMedGoogle Scholar
  22. 22.
    Harden R, Stevenson M, Downie WW, Wilson GM. Assessment of clinical competence using objective structured clinical examination. Br Med J. 1975;1:447–51.PubMedGoogle Scholar
  23. 23.
    Robb KV, Rothman AI. The assessment of clinical skills in general medical residents—comparison of the objective structured clinical examination to a conventional oral examination. Ann R Coll Phys Surg Can. 1985;18:235–8.Google Scholar
  24. 24.
    Petrusa ER, Blackwell TA, Rogers LP, et al. An objective measure of clinical performance. Am J Med. 1987;83:34–41.PubMedCrossRefGoogle Scholar
  25. 25.
    Petrusa ER, Blackwell TA, Ainsworth MA. Reliability and validity of an objective structured clinical examination for assessing the clinical performance of residents. Arch Intern Med. 1990;150:573–7.PubMedCrossRefGoogle Scholar
  26. 26.
    Newble DI, Swanson DB. Psychometric characteristics of the objective structured clinical examination. Med Educ. 1988;22:325–34.PubMedCrossRefGoogle Scholar
  27. 27.
    Weiner BJ. Standardized principles and experimental design. New York: McGraw-Hill, 1971.Google Scholar

Copyright information

© Society of General Internal Medicine 1992

Authors and Affiliations

  • Frank J. Kroboth
    • 1
  • Barbara H. Hanusa
  • Susan Parker
  • John L. Coulehan
  • Wishwa N. Kapoor
  • Frank H. Brown
  • Michael Karpf
  • Gerald S. Levey
  1. 1.University of Pittsburgh, Internal MedicinePittsburgh

Personalised recommendations