Skip to main content
Log in

Rater variables associated with ITER ratings

  • Published:
Advances in Health Sciences Education Aims and scope Submit manuscript

Abstract

Advocates of holistic assessment consider the ITER a more authentic way to assess performance. But this assessment format is subjective and, therefore, susceptible to rater bias. Here our objective was to study the association between rater variables and ITER ratings. In this observational study our participants were clerks at the University of Calgary and preceptors who completed online ITERs between February 2008 and July 2009. Our outcome variable was global rating on the ITER (rated 1–5), and we used a generalized estimating equation model to identify variables associated with this rating. Students were rated “above expected level” or “outstanding” on 66.4 % of 1050 online ITERs completed during the study period. Two rater variables attenuated ITER ratings: the log transformed time taken to complete the ITER [β = −0.06, 95 % confidence interval (−0.10, −0.02), p = 0.002], and the number of ITERs that a preceptor completed over the time period of the study [β = −0.008 (−0.02, −0.001), p = 0.02]. In this study we found evidence of leniency bias that resulted in two thirds of students being rated above expected level of performance. This leniency bias appeared to be attenuated by delay in ITER completion, and was also blunted in preceptors who rated more students. As all biases threaten the internal validity of the assessment process, further research is needed to confirm these and other sources of rater bias in ITER ratings, and to explore ways of limiting their impact.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Alicke, M. D., & Govorun, O. (2005). The better-than-average effect. In M. D. Alicke, D. A. Dunning, & J. Krueger (Eds.), The self in social judgment. Studies in self and identity. Hove, NY: Psychology Press.

    Google Scholar 

  • Alicke, M. D., Klotz, M. L., Breitenbecher, D. L., Yurak, T. J., & Vredenburg, D. S. (1995). Personal contact, individuation, and the better-than-average effect. Journal of Personality and Social Psychology, 68, 804–825.

    Article  Google Scholar 

  • Bandiera, G., & Lendrum, D. (2008). Daily encounter cards facilitate competency-based feedback while leniency bias persists. CJEM Canadian Journal of Emergency Medical Care, 10, 44–50.

    Google Scholar 

  • Barnett, A. G., van der Pols, J. C., & Dobson, A. J. (2005). Regression to the mean: what it is and how to deal with it. International Journal of Epidemiology, 34, 215–220.

    Article  Google Scholar 

  • Burdick, W. P., & Schoffstall, J. (1995). Observation of emergency medicine residents at the bedside: How often does it happen? Academic Emergency Medicine, 2, 909–913.

    Article  Google Scholar 

  • Carline, J. D., Paauw, D. S., Thiede, K. W., & Ramsey, P. G. (1992). Factors affecting the reliability of ratings of students’ clinical skills in a medicine clerkship. Journal of General Internal Medicine, 7, 506–510.

    Article  Google Scholar 

  • Croyle, R. T., Loftus, E. F., Barger, S. D., Sun, Y. C., Hart, M., & Gettig, J. A. (2006). How well do people recall risk factor test results? Accuracy and bias among cholesterol screening participants. Health Psychology, 25, 425–432.

    Article  Google Scholar 

  • Dudek, N. L., Marks, M. B., & Regehr, G. (2005). Failure to fail: the perspectives of clinical supervisors. Academic Medicine, 80(10 Suppl), S84–S87.

    Article  Google Scholar 

  • Frank, J. R., & Danoff, D. (2007). The CanMEDS initiative: implementing an outcomes-based framework of physician competencies. Medical Teacher, 29, 642–647.

    Article  Google Scholar 

  • Giladi, E. E., & Klar, Y. (2002). When standards are wide of the mark: Nonselective superiority and inferiority biases in comparative judgments of objects and concepts. Journal of Experimental Psychology General, 131, 538–551.

    Article  Google Scholar 

  • Ginsburg, S., McIlroy, J., Oulanova, O., Eva, K., & Regehr, G. (2010). Toward authentic clinical evaluation: pitfalls in the pursuit of competency. Academic Medicine, 85, 780–786.

    Article  Google Scholar 

  • Gordon, M. E. (1970). The effect of the correctness of the behavior observed on the accuracy of ratings. Organizational Behavior and Human Performance, 5, 366–377.

    Article  Google Scholar 

  • Holmboe, E. S. (2004). Faculty and the observation of trainees’ clinical skills: Problems and opportunities. Academic Medicine, 79, 16–22.

    Article  Google Scholar 

  • Hoorens, V. (1993). Self-enhancement and superiority biases in social comparison. European Review of Social Psychology, 4, 113–139.

    Article  Google Scholar 

  • Huber, V. L. (1987). Judgment by heuristics: Effect of rate and rater characteristics and performance standards on performance-related judgments. Organizational Behavior and Human Decision Processes, 40, 149–169.

    Article  Google Scholar 

  • Kruger, J., & Dunning, D. (1999). Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments. Journal of Personality and Social Psychology, 77, 1121–1134.

    Article  Google Scholar 

  • LCME (2011) Functions and structure of a medical school: Standards for accreditation of medical education programs leading to the M.D. Degree (May 2011). http://www.lcme.org/functions2011may.pdf (Accessed July 2011).

  • MacCoun, R. J., & Kerr, N. L. (1988). Asymmetric influence in mock jury deliberation: Jurors’ bias for leniency. Journal of Personality and Social Psychology, 54, 21–33.

    Article  Google Scholar 

  • Nesselroade, J. R., Stigler, S. M., & Baltes, P. B. (1980). Regression toward the mean and the study of change. Psychological Bulletin, 88, 622–637.

    Article  Google Scholar 

  • Nisbett, R. E., & Wilson, T. D. (1977). The halo effect: evidence for unconscious alteration of judgments. Journal of Personality and Social Psychology, 35, 250–256.

    Article  Google Scholar 

  • Prescott-Clements, L., van der Vleuten, C. P., Schuwirth, L. W., Hurst, Y., & Rennie, J. S. (2008). Evidence for validity within workplace assessment: the Longitudinal Evaluation of Performance (LEP). Medical Education, 42, 488–495.

    Article  Google Scholar 

  • Ryan Lowitt, N. (2000). How are we doing? The problem of in-training evaluation. Journal of General Internal Medicine, 15, 605–606.

    Article  Google Scholar 

  • Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4, 25–29.

    Article  Google Scholar 

  • Tonesk, X., & Buchanan, R. G. (1987). An AAMC pilot study by 10 medical schools of clinical evaluation of students. Journal of Medical Education, 62, 707–718.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin McLaughlin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Paget, M., Wu, C., McIlwrick, J. et al. Rater variables associated with ITER ratings. Adv in Health Sci Educ 18, 551–557 (2013). https://doi.org/10.1007/s10459-012-9391-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10459-012-9391-y

Keywords

Navigation