Medical Science Educator

, Volume 28, Issue 3, pp 543–551 | Cite as

How Much Is Too Much? Imposed and Perceived Evaluative Demands Among Physician Educators

  • Courtney J. LloydEmail author
  • Melissa R. Alexander
  • Adam B. Wilson
Original Research


The number and frequency of physician-performed assessments of medical trainees is on the rise within medical academia. Because the consequences of implementing these higher assessment demands remain unknown, it is timely to establish a framework for monitoring the imposed and perceived evaluative responsibilities of physician educators. This study explored the imposed and perceived evaluative responsibilities among different populations of physician educators who complete clinical clerkship evaluations (CCEs) using the constructs “evaluative strain,” “evaluative assignment,” and “evaluative activity.” An evaluative strain instrument was administered in 2016 at Indiana University School of Medicine to physician educators who evaluate third-year medical trainees using CCEs. Evaluative assignment and evaluative activity were estimated using CCEs as a proxy for the assigned and completed volume of evaluations. Evaluative strain, assignment, and activity scores were reported globally and compared across medical departments. Evaluative strain was regressed on evaluative assignment and evaluative activity to determine the extent of their relationships. Physician educators had moderate evaluative strain scores (M = 45.4/100, SD = 19.4) with OBGYN physicians reporting significantly higher scores (p ≤ 0.015) than other departments. The “temporal demands” dimension of evaluative strain was perceived to be the most influential aspect of the evaluative process. Neither evaluative assignment nor evaluative activity was related to evaluative strain (p ≥ 0.61). This research demonstrated evidence of moderate evaluative strain levels among participating physicians. The utility of this study may hold promise as a framework for future research aimed at monitoring changes in the evaluative demands placed on physician educators.


Medical student assessment Performance-based assessments Clinical clerkship evaluations Evaluation fatigue Survey fatigue 



The authors wish to thank the physicians from IUSM for participating in the study, the medical school leadership for their recruitment efforts, and Joel Smith from the IUSM Office of Medical Student Education for his generous assistance with data collection.

Compliance with Ethical Standards

Competing Interests

The authors declare that there is no conflict of interest.

Ethical Approval

This study was granted exemption status by the local Institutional Review Board at Indiana University School of Medicine (Protocol No. 1604657202) on June 9, 2016.


  1. 1.
    Gingerich A, Regehr G, Eva KW. Rater-based assessments as social judgments: rethinking the etiology of rater errors. Acad Med. 2011;86(10 Suppl):S1–7.CrossRefGoogle Scholar
  2. 2.
    Hatala R, Norman GR. In-training evaluation during an internal medicine clerkship. Acad Med. 1999;74(10 Suppl):S118–20.CrossRefGoogle Scholar
  3. 3.
    Clauser B, Clyman S. Components of rater error in a complex performance assessment. J Educ Meas. 1999;36(1):29–45.CrossRefGoogle Scholar
  4. 4.
    van Barneveld C. The dependability of medical students’ performance ratings as documented on in-training evaluations. Acad Med. 2005;80(3):309–12.CrossRefGoogle Scholar
  5. 5.
    Cacamese SM, Elnicki M, Speer AJ. Grade inflation and the internal medicine subinternship: a national survey of clerkship directors. Teach Learn Med. 2007;19(4):343–6.CrossRefGoogle Scholar
  6. 6.
    Silber C, Nasca T, Paskin D, Eiger G, Robenson M, Veloski J. Do global rating forms enable program directors to assess the ACGME competencies? Acad Med. 2004;79(6):549–56.CrossRefGoogle Scholar
  7. 7.
    Epstein RM. Assessment in medical education. N Engl J Med. 2007;356(4):387–96.CrossRefGoogle Scholar
  8. 8.
    Williams RG, Klamen DA, McGaghie WC. Cognitive, social and environmental sources of bias in clinical performance ratings. Teach Learn Med. 2003;15(4):270–92.CrossRefGoogle Scholar
  9. 9.
    American Board of Surgery. Training and certification: resident performance assessments. 2015. [Accessed January 3, 2016.]
  10. 10.
    Bates J, Konkin J, Suddards C, Dobson S, Pratt D. Student perceptions of assessment and feedback in longitudinal integrated clerkships. Med Educ. 2013;47(4):362–74.CrossRefGoogle Scholar
  11. 11.
    O'Donoghue S, McGrath D, Cullen W. How do longitudinal clerkships in general practice/primary care impact on student experience and career intention? A cross-sectional study of student experience. Educ Prim Care. 2015;26(3):166–75.CrossRefGoogle Scholar
  12. 12.
    Dubé TV, Schinke RJ, Strasser R, Couper I, Lightfoot NE. Transition processes through a longitudinal integrated clerkship: a qualitative study of medical students’ experiences. Med Educ. 2015;49(10):1028–37.CrossRefGoogle Scholar
  13. 13.
    Kogan J, Lapin J, Aagaard E, Boscardin C, Aiyer M, Cayea D, et al. The effect of resident duty-hours restrictions on internal medicine clerkship experiences: surveys of medical students and clerkship directors. Teach Learn Med. 2015;27(1):37–50.CrossRefGoogle Scholar
  14. 14.
    Katowa-Mukwato P, Andrews B, Maimbolwa M, Lakhi S, Michelo C, Mulla Y, et al. Medical students’ clerkship experiences and self-perceived competence in clinical skills. Afr J Health Prof Educ. 2014;6(2):155–60.Google Scholar
  15. 15.
    McLaughlin K, Vitale G, Coderre S, Violato C, Wright B. Clerkship evaluation: what are we measuring? Med Teach. 2009;31(2):e36–9.CrossRefGoogle Scholar
  16. 16.
    Iramaneerat C, Yudkowsky R. Rater errors in a clinical skills assessment of medical students. Eval Health Prof. 2007;30(3):266–83.CrossRefGoogle Scholar
  17. 17.
    Farrell TM, Kohn GP, Owen SM, Meyers MO, Stewart RA, Meyer AA. Low correlation between subjective and objective measures of knowledge on surgery clerkships. J Am Coll Surg. 2010;210(5):680–3. 3-5CrossRefGoogle Scholar
  18. 18.
    Goldstein SD, Lindeman B, Colbert-Getz J, Arbella T, Dudas R, Lidor A, et al. Faculty and resident evaluations of medical students on a surgery clerkship correlate poorly with standardized exam scores. Am J Surg. 2014;207(2):231–5.CrossRefGoogle Scholar
  19. 19.
    Dudas RA, Colbert JM, Goldstein S, Barone MA. Validity of faculty and resident global assessment of medical students’ clinical knowledge during their pediatrics clerkship. Acad Pediatr. 2012;12(2):138–41.CrossRefGoogle Scholar
  20. 20.
    Awad SS, Liscum KR, Aoki N, Awad SH, Berger DH. Does the subjective evaluation of medical student surgical knowledge correlate with written and oral exam performance? J Surg Res. 2002;104(1):36–9.CrossRefGoogle Scholar
  21. 21.
    Oaks WW, Scheinok PA, Husted FL. Objective evaluation of a method of assessing student performance in a clinical clerkship. J Med Educ. 1969;44(3):207–13.Google Scholar
  22. 22.
    Hull AL. Medical student performance: a comparison of house officer and attending staff as evaluators. Eval Health Prof. 1982;5(1):87–94.CrossRefGoogle Scholar
  23. 23.
    Kreiter CD, Ferguson K, Lee W-C, Brennan RL, Densen P. A generalizability study of a new standardized rating form used to evaluate students’ clinical clerkship performances. Acad Med. 1998;73(12):1294–8.CrossRefGoogle Scholar
  24. 24.
    Saguil A, Balog EK, Goldenberg MN, Dong T, Artino AR Jr, Zahn CM, et al. The association between specialty match and third-year clerkship performance. Mil Med. 2012;177(9 Suppl):47–52.CrossRefGoogle Scholar
  25. 25.
    Hemmer PA, Hawkins R, Jackson JL, Pangaro LN. Assessing how well three evaluation methods detect deficiencies in medical students’ professionalism in two settings of an internal medicine clerkship. Acad Med. 2000;75(2):167–73.CrossRefGoogle Scholar
  26. 26.
    Hemmer PA, Pangaro L. The effectiveness of formal evaluation sessions during clinical clerkships in better identifying students with marginal funds of knowledge. Acad Med. 1997;72(7):641–3.CrossRefGoogle Scholar
  27. 27.
    Plymale MA, Donnelly MB, Lawton J, Pulito AR, Mentzer RM. Faculty evaluation of surgery clerkship students: important components of written comments. Acad Med. 2002;77(10 Suppl):S45–7.CrossRefGoogle Scholar
  28. 28.
    Pulito AR, Donnelly MB, Plymale M. Factors in faculty evaluation of medical students’ performance. Med Educ. 2007;41(7):667–75.CrossRefGoogle Scholar
  29. 29.
    Tavares W, Ginsburg S, Eva KW. Selecting and simplifying: rater performance and behavior when considering multiple competencies. Teach Learn Med. 2016;28(1):41–51.CrossRefGoogle Scholar
  30. 30.
    Govaerts MJ, Van de Wiel MW, Schuwirth LW, Van der Vleuten CP, Muijtjens AM. Workplace-based assessment: raters’ performance theories and constructs. Adv Health Sci Educ Theory Pract. 2013;18(3):375–96.CrossRefGoogle Scholar
  31. 31.
    Kogan J, Conforti L, Bernabeo E, Iobst W, Holmboe E. Opening the black box of clinical skills assessment via observation: a conceptual model. Med Educ. 2011;45(10):1048–60.CrossRefGoogle Scholar
  32. 32.
    Norcini JJ. Current perspectives in assessment: the assessment of performance at work. Med Educ. 2005;39(9):880–9.CrossRefGoogle Scholar
  33. 33.
    MedHub. Medical education solutions. 2016. [Accessed Februray 24, 2016.]
  34. 34.
    Gauthier G, St-Onge C, Tavares W. Rater cognition: review and integration of research findings. Med Educ. 2016;50(5):511–22.CrossRefGoogle Scholar
  35. 35.
    Hart SG. NASA-task load index (NASA-TLX): 20 years later. Proceedings of the Human Factors and Ergonomics Society Annual Meeting. 2006;50(9):904–8.CrossRefGoogle Scholar
  36. 36.
    Hart SG, Staveland LE. Development of NASA-TLX (task load index): results of empirical and theoretical research. In: Hancock PA, Meshkati N, editors. Human mental workload: advances in psychology. Oxford: North Holland; 1988. p. 139–83.CrossRefGoogle Scholar
  37. 37.
    Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap): a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–81.CrossRefGoogle Scholar
  38. 38.
    Wilson MR, Poolton JM, Malhotra N, Ngo K, Bright E, Masters RS. Development and validation of a surgical workload measure: the surgery task load index (SURG-TLX). World J Surg. 2011;35(9):1961–9.CrossRefGoogle Scholar
  39. 39.
    Dye E, Wells N. Subjective and objective measurement of neonatal nurse practitioner workload. Adv Neonatal Care. 2016;17(4):E3–E12.CrossRefGoogle Scholar
  40. 40.
    Pauzie A. A method to assess the driver mental workload: the driving activity load index (DALI). IET Intell Transp Syst. 2008;2(4):315–22.CrossRefGoogle Scholar
  41. 41.
    Wetzel CM, Kneebone RL, Woloshynowych M, Nestel D, Moorthy K, Kidd J, et al. The effects of stress on surgical performance. Am J Surg. 2006;191(1):5–10.CrossRefGoogle Scholar
  42. 42.
    Murphy KR, Myors B, Wolach, A. Statistical power analysis: a simple and general model for traditional and modern hypothesis tests. 3rd ed. New York: Routledge; 2009.Google Scholar
  43. 43.
    Campbell DJ. Task complexity: a review and analysis. Acad Manag Rev. 1988;13(1):40–52.CrossRefGoogle Scholar
  44. 44.
    Braarud PØ. Subjective task complexity and subjective workload: criterion validity for complex team tasks. Int J Cogn Ergon. 2001;5(3):261–73.CrossRefGoogle Scholar
  45. 45.
    Tavares W, Eva KW. Impact of rating demands on rater-based assessments of clinical competence. Educ Prim Care. 2014;25(6):308–18.CrossRefGoogle Scholar
  46. 46.
    Bowen RE, Grant WJ, Schenarts KD. The sum is greater than its parts: clinical evaluations and grade inflation in the surgery clerkship. Am J Surg. 2015;209(4):760–4.CrossRefGoogle Scholar
  47. 47.
    Sharp LM, Frankel J. Respondent burden: a test of some common assumptions. Public Opin Q. 1983;47(1):36–53.CrossRefGoogle Scholar
  48. 48.
    Porter SR, Whitcomb ME, Weizter WH. Multiple surveys of students and survey fatigue. New Directions for Institutional Research. 2004;2004:63–73.CrossRefGoogle Scholar
  49. 49.
    Apodaca R, Lea S, Edwards B. The effect of longitudinal burden on survey participation. Presented at the Annual Conference of the American Association of Public Opinion Research, 1998, St. Louis, MO.Google Scholar
  50. 50.
    Sosdian CP, Sharp LM. Nonresponse in mail surveys: access failure or respondent resistance. Public Opin Q. 1980;44(3):396–402.CrossRefGoogle Scholar
  51. 51.
    Asiu BW, Antons CM, Fultz ML. Undergraduate perceptions of survey participation: improving response rates and validity. Presented at the Annual Meeting of the Association of Institutional Research, 1998, Minneapolis, MN.Google Scholar
  52. 52.
    Revilla M, Ochoa C. What are the links in a web survey among response time, quality, and auto-evaluation of the efforts done? Soc Sci Comput Rev. 2015;33(1):97–114.CrossRefGoogle Scholar
  53. 53.
    Schaeffer NC, Presser S. The science of asking questions. Annu Rev Sociol. 2003;29:65–88.CrossRefGoogle Scholar
  54. 54.
    Nutter D, Whitcomb M. The AAMC project on the clinical education of medical students. Washington, DC: Association of American Colleges; 2001.Google Scholar

Copyright information

© International Association of Medical Science Educators 2018

Authors and Affiliations

  1. 1.Department of Physician Assistant StudiesUniversity of Saint FrancisFort WayneUSA
  2. 2.IndianapolisUSA
  3. 3.Department of Cell and Molecular MedicineRush UniversityChicagoUSA

Personalised recommendations