Abstract
Daily evaluations of certified registered nurse anesthetists’ (CRNAs’) work habits by anesthesiologists should be adjusted for rater leniency. The current study tested the hypothesis that there is a pairwise association by rater between leniencies of evaluations of CRNAs’ daily work habits and of didactic lectures. The historical cohorts were anesthesiologists’ evaluations over 53 months of CRNAs’ daily work habits and 65 months of didactic lectures by visiting professors and faculty. The binary endpoints were the Likert scale scores for all 6 and 10 items, respectively, equaling the maximums of 5 for all items, or not. Mixed effects logistic regression estimated the odds of each ratee performing above or below average adjusted for rater leniency. Bivariate errors in variables least squares linear regression estimated the association between the leniency of the anesthesiologists’ evaluations of work habits and didactic lectures. There were 29/107 (27%) raters who were more severe in their evaluations of CRNAs’ work habits than other anesthesiologists (two-sided P < 0.01); 34/107 (32%) raters were more lenient. When evaluating lectures, 3/81 (4%) raters were more severe and 8/81 (10%) more lenient. Among the 67 anesthesiologists rating both, leniency (or severity) for work habits was not associated with that for lectures (P = 0.90, unitless slope between logits 0.02, 95% confidence interval −0.34 to 0.30). Rater leniency is of large magnitude when making daily clinical evaluations, even when using a valid and psychometrically reliable instrument. Rater leniency was context dependent, not solely a reflection of raters’ personality or rating style.
Similar content being viewed by others
Notes
For CRNAs’ daily work habits, covariates not significantly associated with their scores included the number of times the ratee worked with the rater, the number of times the ratee was evaluated by the rater, percent time ratee worked with rater and rater completed evaluation, number of cases started by ratee, days worked by ratee, ratio of cases started to days worked by the ratee, intraoperative hours divided by patient care days, percent cases with patient age < 13 years, percent cases evenings or weekends, percent cases with patient’s American Society of Anesthesiologists’ Physical Status >3, percent with American Society of Anesthesiologists’ base units >8, percent cases with break or handoff, percent cases at the hospital (main) surgical suite, and percent cases at the ambulatory surgery center [16].
The sorting assures the minimum change, because, as explained below in the subsection “Analyses of work habit scores,” analyses are binary.
This calculation was performed using StatXact 11.1, Cytel, Cambridge, MA.
Using 0.01 < P < 0.05, there were 3/107 (2.8%) ratees below average and 7/107 (6.5%) above average. When calculations were repeated without the first two years entered as a binary fixed effect, there were still 35/107 (32.7%) of ratees who were significantly (P < 0.01) below or above average.
With the robust, clustered standard errors, there were 63 raters significantly different than others at P < 0.01, where 63 = 29 + 34. Using asymptotic standard errors, there were 69 raters with P < 0.01. This was the expected and desired result of using robust standard errors. The implication is that use of the robust estimators did not incorrectly result in markedly underestimated standard errors.
For the current study, for each rater with every score equaling the maximum, we changed the score from 5.00 to 4.83, where 4.83 = (5-items × 5-points + 1-item × 4-points) / (5-items). We did so because we were investigating the raters. For routine use when evaluating ratees, there would be no reason (that we are aware) to change the scores. The raters’ scores would be removed fully.
References
Hamilton TE (2004) Centers for Medicare & Medicaid Services (CMS) requirements for hospital medical staff privileging. S&C-05-04. Centers for Medicare & Medicaid Services. https://www.cms.gov/Medicare/Provider-Enrollment-and-Certification/SurveyCertificationGenInfo/Downloads/SCletter05-04.pdf. Accessed May 14, 2020
The Joint Commission (2011) Standards BoosterPak™ for focused professional practice evaluation/ ongoing professional practice evaluation (FPPE/OPPE). Oakbrook Terrace, Illinois
Wikipedia (2020) High-stakes testing. https://en.wikipedia.org/wiki/High-stakes_testing. Accessed 14 May 2020
Dexter F, Bayman EO, Wong CA, Hindman BJ (2020) Reliability of ranking anesthesiologists and nurse anesthetists using leniency-adjusted clinical supervision and work habits scores. J Clin Anesth 61:109639
Ehrenfeld JM, Henneman JP, Peterfreund RA, Sheehan TD, Xue F, Spring S, Sandberg WS (2012) Ongoing professional performance evaluation (OPPE) using automatically captured electronic anesthesia data. Jt Comm J Qual Patient Saf 38:73–80
Bayman EO, Dexter F, Todd MM (2015) Assessing and comparing anesthesiologists’ performance on mandated metrics using a Bayesian approach. Anesthesiology 123:101–115
Bayman EO, Dexter F, Todd MM (2016) Prolonged operative time to extubation is not a useful metric for comparing the performance of individual anesthesia providers. Anesthesiology 124:322–338
Dexter F, Hindman BJ (2016) Do not use hierarchical logistic regression models with low incidence outcome data to compare anesthesiologists in your department. Anesthesiology 125:1083–1084
Epstein RH, Dexter F, Schwenk ES (2017) Hypotension during induction of anaesthesia is neither a reliable nor useful quality measure for comparison of anaesthetists’ performance. Br J Anaesth 119:106–114
Dexter F, Masursky D, Szeluga D, Hindman BJ (2016) Work habits are valid component of evaluations of anesthesia residents based on faculty anesthesiologists’ daily written comments about residents. Anesth Analg 122:1625–1633
Dexter F, Ledolter J, Hindman BJ (2014) Bernoulli cumulative sum (CUSUM) control charts for monitoring of anesthesiologists’ performance in supervising anesthesia residents and nurse anesthetists. Anesth Analg 119:679–685
Bayman EO, Dexter F, Ledolter J (2017) Mixed effects logistic regression modeling of daily evaluations of nurse anesthetists’ work habits adjusting for leniency of the rating anesthesiologists. PCORM 6:14–19
Dexter F, Ledolter J, Hindman BJ (2017) Measurement of faculty anesthesiologists’ quality of clinical supervision has greater reliability when controlling for the leniency of the rating anesthesia resident: a retrospective cohort study. Can J Anesth 64:643–655
Dexter F, Ledolter J, Smith TC, Griffiths D, Hindman BJ (2014) Influence of provider type (nurse anesthetist or resident physician), staff assignments, and other covariates on daily evaluations of anesthesiologists' quality of supervision. Anesth Analg 119:670–678
Dexter F, Ledolter J, Epstein R, Hindman BJ (2017) Operating room anesthesia subspecialization is not associated with significantly greater quality of supervision of anesthesia residents and nurse anesthetists. Anesth Analg 124:1253–1260
Dexter F, Ledolter J, Hindman BJ (2017) Validity of using a work habits scale for the daily evaluation of nurse anesthetists’ clinical performance while controlling for the leniencies of the rating anesthesiologists. J Clin Anesth 42:63–68
Logvinov II, Dexter F, Hindman BJ, Brull SD (2017) Anesthesiologists’ perceptions of minimum acceptable work habits of nurse anesthetists. J Clin Anesth 38:107–110
Bernardin HJ, Cooke DK, Villanova P (2000) Conscientiousness and agreeableness as predictors of rating leniency. J Appl Psychol 85:232–236
Spence JR, Keeping LM (2010) The impact of non-performance information on ratings of job performance: A policy-capturing approach. J Organ Behav 31:587–608
Dewberry C, Davies-Muir A, Newell S (2013) Impact and causes of rater severity/leniency in appraisals without postevaluation communication between raters and ratees. Int J Sel Assess 21:286–293
Dannefer EF, Henson LC, Bierer SB, Grady-Weliky TA, Meldrum S, Nofziger AC, Barclay C, Epstein RM (2005) Peer assessment of professional competence. Med Educ 39:713–722
O’Brien MK, Dexter F, Kreiter CD, Slater-Scott C, Hindman BJ (2019) Nurse anesthetists’ evaluations of anesthesiologists’ operating room performance are sensitive to anesthesiologists’ years of postgraduate practice. J Clin Anesth 54:102–110
University of Iowa Carver College of Medicine (2007) Peer evaluation of teaching. https://www.medicine.uiowa.edu/facultyaffairs/sites/medicine.uiowa.edu.facultyaffairs/files/wysiwyg_uploads/PeerTeachingEvaluation.pdf. Accessed May 14, 2020
melogit — Multilevel mixed-effects logistic regression. https://www.stata.com/manuals13/memelogit.pdf. Accessed May 14, 2020
Sribney B (2020) Advantages of the robust variance estimator. Stata. https://www.stata.com/support/faqs/statistics/robust-variance-estimator/. Accessed 14 May 2020
Nichols A, Schaffer M (2007) Clustered errors in Stata. Stata. https://www.stata.com/meeting/13uk/nichols_crse.pdf. Accessed 14 May 2020
Glance LG, Dick AW (2016) In response. Anesth Analg 122:1722–1727
Robust and clustered standard errors. https://www.stata.com/manuals/semintro8.pdf. Accessed May 14, 2020
York D (1969) Least squares fitting of a straight line with correlated errors. Earth Planet Sci Lett 5:320–324
Williamson JH (1968) Least-squares fitting of a straight line. Can J Phys 46:1845–1847
Cantrell CA (2008) Review of methods for linear least-squares fitting of data and application to atmospheric chemistry problems. Atmos Chem Phys 8:5744–5487
Tellinghuisen J (2010) Least-squares analysis of data with uncertainty in x and y: A Monte Carlo methods comparison. Chemom Intell Lab Syst 103:160–169
Dexter F, Hadlandsmyth K, Pearson ACS, Hindman BJ (2020) Reliability and validity of performance evaluations of pain medicine clinical faculty by residents and fellows using a supervision scale. Anesth Analg https://doi.org/10.1213/ANE.0000000000004779
Webb NM, Shavelson RJ, Haertel EH (2006) 4 reliability coefficients and generalizability theory. Handbook of Statistics 26:81–124
Jeon Y, Meretoja R, Vahlberg T, Leino-Kilpi H (2020) Developing and psychometric testing of the anaesthesia nursing competence scale. J Eval Clin Pract 26:866–878
Müller T, Montano D, Poinstingl H, Dreiling K, Schiekirka-Schwake S, Anders S, Raupach T, von Steinbüchel N (2017) Evaluation of large-group lectures in medicine - development of the SETMED-L (Student Evaluation of Teaching in MEDical Lectures) questionnaire. BMC Med Educ 17:137
Perella P, Palmer E, Conway R, Wong DJN (2019) A retrospective analysis of case-load and supervision from a large anaesthetic logbook database. Anaesthesia 74:1524–1533
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
None.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
ESM 1
(PDF 99 kb)
Rights and permissions
About this article
Cite this article
Dexter, F., Ledolter, J., Wong, C.A. et al. Association between leniency of anesthesiologists when evaluating certified registered nurse anesthetists and when evaluating didactic lectures. Health Care Manag Sci 23, 640–648 (2020). https://doi.org/10.1007/s10729-020-09518-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10729-020-09518-0