Advances in Health Sciences Education

, Volume 18, Issue 4, pp 551–557 | Cite as

Rater variables associated with ITER ratings

  • Michael Paget
  • Caren Wu
  • Joann McIlwrick
  • Wayne Woloschuk
  • Bruce Wright
  • Kevin McLaughlin
Article

Abstract

Advocates of holistic assessment consider the ITER a more authentic way to assess performance. But this assessment format is subjective and, therefore, susceptible to rater bias. Here our objective was to study the association between rater variables and ITER ratings. In this observational study our participants were clerks at the University of Calgary and preceptors who completed online ITERs between February 2008 and July 2009. Our outcome variable was global rating on the ITER (rated 1–5), and we used a generalized estimating equation model to identify variables associated with this rating. Students were rated “above expected level” or “outstanding” on 66.4 % of 1050 online ITERs completed during the study period. Two rater variables attenuated ITER ratings: the log transformed time taken to complete the ITER [β = −0.06, 95 % confidence interval (−0.10, −0.02), p = 0.002], and the number of ITERs that a preceptor completed over the time period of the study [β = −0.008 (−0.02, −0.001), p = 0.02]. In this study we found evidence of leniency bias that resulted in two thirds of students being rated above expected level of performance. This leniency bias appeared to be attenuated by delay in ITER completion, and was also blunted in preceptors who rated more students. As all biases threaten the internal validity of the assessment process, further research is needed to confirm these and other sources of rater bias in ITER ratings, and to explore ways of limiting their impact.

Keywords

Performance assessment In-training evaluation Illusory superiority Medical students Regression to the mean 

References

  1. Alicke, M. D., & Govorun, O. (2005). The better-than-average effect. In M. D. Alicke, D. A. Dunning, & J. Krueger (Eds.), The self in social judgment. Studies in self and identity. Hove, NY: Psychology Press.Google Scholar
  2. Alicke, M. D., Klotz, M. L., Breitenbecher, D. L., Yurak, T. J., & Vredenburg, D. S. (1995). Personal contact, individuation, and the better-than-average effect. Journal of Personality and Social Psychology, 68, 804–825.CrossRefGoogle Scholar
  3. Bandiera, G., & Lendrum, D. (2008). Daily encounter cards facilitate competency-based feedback while leniency bias persists. CJEM Canadian Journal of Emergency Medical Care, 10, 44–50.Google Scholar
  4. Barnett, A. G., van der Pols, J. C., & Dobson, A. J. (2005). Regression to the mean: what it is and how to deal with it. International Journal of Epidemiology, 34, 215–220.CrossRefGoogle Scholar
  5. Burdick, W. P., & Schoffstall, J. (1995). Observation of emergency medicine residents at the bedside: How often does it happen? Academic Emergency Medicine, 2, 909–913.CrossRefGoogle Scholar
  6. Carline, J. D., Paauw, D. S., Thiede, K. W., & Ramsey, P. G. (1992). Factors affecting the reliability of ratings of students’ clinical skills in a medicine clerkship. Journal of General Internal Medicine, 7, 506–510.CrossRefGoogle Scholar
  7. Croyle, R. T., Loftus, E. F., Barger, S. D., Sun, Y. C., Hart, M., & Gettig, J. A. (2006). How well do people recall risk factor test results? Accuracy and bias among cholesterol screening participants. Health Psychology, 25, 425–432.CrossRefGoogle Scholar
  8. Dudek, N. L., Marks, M. B., & Regehr, G. (2005). Failure to fail: the perspectives of clinical supervisors. Academic Medicine, 80(10 Suppl), S84–S87.CrossRefGoogle Scholar
  9. Frank, J. R., & Danoff, D. (2007). The CanMEDS initiative: implementing an outcomes-based framework of physician competencies. Medical Teacher, 29, 642–647.CrossRefGoogle Scholar
  10. Giladi, E. E., & Klar, Y. (2002). When standards are wide of the mark: Nonselective superiority and inferiority biases in comparative judgments of objects and concepts. Journal of Experimental Psychology General, 131, 538–551.CrossRefGoogle Scholar
  11. Ginsburg, S., McIlroy, J., Oulanova, O., Eva, K., & Regehr, G. (2010). Toward authentic clinical evaluation: pitfalls in the pursuit of competency. Academic Medicine, 85, 780–786.CrossRefGoogle Scholar
  12. Gordon, M. E. (1970). The effect of the correctness of the behavior observed on the accuracy of ratings. Organizational Behavior and Human Performance, 5, 366–377.CrossRefGoogle Scholar
  13. Holmboe, E. S. (2004). Faculty and the observation of trainees’ clinical skills: Problems and opportunities. Academic Medicine, 79, 16–22.CrossRefGoogle Scholar
  14. Hoorens, V. (1993). Self-enhancement and superiority biases in social comparison. European Review of Social Psychology, 4, 113–139.CrossRefGoogle Scholar
  15. Huber, V. L. (1987). Judgment by heuristics: Effect of rate and rater characteristics and performance standards on performance-related judgments. Organizational Behavior and Human Decision Processes, 40, 149–169.CrossRefGoogle Scholar
  16. Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77, 1121–1134.CrossRefGoogle Scholar
  17. LCME (2011) Functions and structure of a medical school: Standards for accreditation of medical education programs leading to the M.D. Degree (May 2011). http://www.lcme.org/functions2011may.pdf (Accessed July 2011).
  18. MacCoun, R. J., & Kerr, N. L. (1988). Asymmetric influence in mock jury deliberation: Jurors’ bias for leniency. Journal of Personality and Social Psychology, 54, 21–33.CrossRefGoogle Scholar
  19. Nesselroade, J. R., Stigler, S. M., & Baltes, P. B. (1980). Regression toward the mean and the study of change. Psychological Bulletin, 88, 622–637.CrossRefGoogle Scholar
  20. Nisbett, R. E., & Wilson, T. D. (1977). The halo effect: evidence for unconscious alteration of judgments. Journal of Personality and Social Psychology, 35, 250–256.CrossRefGoogle Scholar
  21. Prescott-Clements, L., van der Vleuten, C. P., Schuwirth, L. W., Hurst, Y., & Rennie, J. S. (2008). Evidence for validity within workplace assessment: the Longitudinal Evaluation of Performance (LEP). Medical Education, 42, 488–495.CrossRefGoogle Scholar
  22. Ryan Lowitt, N. (2000). How are we doing? The problem of in-training evaluation. Journal of General Internal Medicine, 15, 605–606.CrossRefGoogle Scholar
  23. Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4, 25–29.CrossRefGoogle Scholar
  24. Tonesk, X., & Buchanan, R. G. (1987). An AAMC pilot study by 10 medical schools of clinical evaluation of students. Journal of Medical Education, 62, 707–718.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2012

Authors and Affiliations

  • Michael Paget
    • 1
  • Caren Wu
    • 1
  • Joann McIlwrick
    • 1
  • Wayne Woloschuk
    • 1
  • Bruce Wright
    • 1
  • Kevin McLaughlin
    • 1
  1. 1.Office of Undergraduate Medical Education, Health Sciences CentreUniversity of CalgaryCalgaryCanada

Personalised recommendations