Sample Demographic Data
Of all 250 participants who completed baseline survey measures, 48.8% identified themselves as women, 50.4% identified themselves as men, and 0.8% elected not to answer the question on gender. The majority (51.2%) were 30 to 39, 33.2% were < 30, 8.8% were 40 to 49, 6.0% were ≥ 50 years of age, and 0.8% elected not to answer the question on age category. The majority (52.4%) identified their race as white or Caucasian; 39.6% identified themselves as Asian; 6.4% identified themselves as part of another racial category (i.e., Indian or South Asian, Black or African, or Latino); and 1.6% elected not to answer the question on race. Specialties reported by participants included the following: medicine (26.4%), radiology (11.6%), anesthesia (10.4%), pediatrics (9.6%), surgery subspecialty (7.2%), pathology (5.6%), radiology subspecialty (5.6%), psychiatry (4.8%), surgery (4.0%), medicine subspecialty (3.6%), neurology (2.8%), obstetrics/gynecology (2.4%), pediatric subspecialty (0.8%), and emergency medicine (0.8%). One participant (0.4%) reported two specialties and ten (4.0%) elected not to report specialty. Of all participants, 185 (74%) were house-staff physicians. Of all 250 study participants, 227 (91%) completed the follow-up survey.
Survey Scale Descriptive Data
Table 1 presents descriptive statistics for PFI scales and self-reported medical error scale. There is either a floor or a ceiling effect (responses observed at the minimum or maximum of a scale score, respectively) for all scales, which was less than 20% in every case.
Of the 227 participants who completed measures at both time points, 100 (44.1%) had stable sleep-related impairment scores (≤ 2-point change; scale range 8 to 40) and thus would be expected to have more stable measures of work exhaustion, depersonalization, overall burnout, and professional fulfillment over time (see “Methods”). Table 2 presents scale Cronbach’s alpha and test-retest reliability estimates—correlations between time-one and time-two scores (2 to 3 weeks later)—in the sample (n = 100) of participants with stable (defined as change < 2 points; scale range 8 to 40) sleep-related impairment scores across the two time points. These estimates indicate good internal consistency and test-retest reliability for PFI scales.
Sensitivity to Detect Expected Change
Table 2 also demonstrates correlations between changes in PROMIS sleep-related impairment scores from baseline to follow-up assessment and changes in PFI scores across the same time period. ROC analyses results also presented in Table 2 suggest adequate sensitivity of all PFI scales to register expected effects of changes of two points or greater in sleep related impairment scale across the same time period—baseline to follow-up between 2 and 3 weeks later.
Convergent Validity of PFI Scores with Previously Validated Measures
Table 2 presents correlations between PFI scores at time-one and their closest MBI scale score equivalents. Correlations between the conceptually similar PFI and MBI exhaustion scales are high. The correlations between the conceptually similar PFI interpersonal disengagement and the MBI depersonalization scales are also relatively high (≥ 0.5), although to a lesser degree. The correlation is moderate (≥ 0.3 and < 0.5) between the conceptually related but more substantively different PFI professional fulfillment and MBI personal accomplishment scales.
Convergent and Discriminant Validity of PFI Items and Factor Validity
We used factor analysis to justify creation of PFI subscales. Kaiser-Mayer-Olkin measure of sampling adequacy (0.93) and Bartlett’s test of sphericity (p < 0.001) indicate suitable data for principal component analysis. Eigenvalues for components 1, 2, and 3 were 8.62, 1.77, and 1.15, respectively. The next highest eigenvalue was 0.67, suggesting retention of a three-component measurement model was reasonable. The three components, professional fulfillment, interpersonal disengagement, and work exhaustion sequentially explained 53.9, 11.0, and 7.2% of combined PFI item set variance. Although these cannot be summed meaningfully in the context of oblique rotation results, considered subjectively these estimates suggest the three component measurement model adequately accounts for the majority of variance in the PFI item set. Perhaps most importantly, conceptual meaningfulness was an important part of our decision to retain three factors. The separation of PFI items by these three components is consistent with face validity of the items within each component-based set and provides a conceptually meaningful measurement schema. Table 3 presents the component loading results of principal components analysis. Pattern matrix results demonstrate adequate convergent and discriminant validity of PFI items to measure three components: (1) professional fulfillment (six items), (2) interpersonal disengagement (six items), and (3) work exhaustion (four items). Scale scores for each component were calculated by taking the average score (range 0 to 4) of each of the items within each scale. There were a few (< 5) respondents who left one or more items blank. We calculated the average scale item score for these respondents if they responded to at least 75% of scale items. The two burnout components—work exhaustion and interpersonal disengagement—are highly correlated (r = 0.66). Both burnout components are negatively correlated with professional fulfillment (work exhaustion: r = −0.59; interpersonal disengagement: r = −0.64).
The MBI depersonalization scale and the PFI interpersonal disengagement scale both had moderate correlations with self-reported medical errors (Table 4). The internal consistency reliability estimate for the self-reported medical error scale is questionable (α = 0.62). The relatively low internal consistency in this short four-item scale may be attributable to the small number of scale items, since Cronbach’s alpha depends—in part—on number of scale items. Therefore, we also calculated the mean inter-item correlation for this scale (0.28), which is acceptable .
MBI emotional exhaustion, PFI work exhaustion and PFI overall burnout (average score across all PFI burnout items) all had small (> 0.1 < 0.3) but statistically significant correlations with self-reported medical errors. Neither MBI’s personal accomplishment nor PFI professional fulfillment correlated significantly with self-reported medical errors. All MBI and PFI scales correlated moderately or highly in expected directions with PROMIS sleep-related impairment, depression symptom, and anxiety symptom scales, with the exception of the correlation between MBI personal accomplishment and sleep-related impairment, which was − 0.27 (Table 4). Figure 1 demonstrates the dose-response effect on medical errors, sleep-related impairment, depression, and anxiety of PFI burnout scores by quartile. All correlations between MBI and PFI scales with WHOQOL-BREF physical, psychological, social, and environmental domain quality of life scores were moderate to high, with the exception of smaller but statistically significant correlations between social quality of life scores and PFI interpersonal disengagement, MBI depersonalization, and MBI personal accomplishment scores (Table 4).
ROC Analyses and Cut-Points for PFI Professional Fulfillment and Burnout Scales
To our knowledge, there is no other current validated measure of professional fulfillment. We conducted an ROC analysis using the first item of the WHOQOL-BREF, which is “How would you rate your quality of life?” Response options for this question are “very poor,” “poor,” “neither poor nor good,” “good,” or “very good.” With a response to this item of “very good” set at the positive state, ROC analysis demonstrated the PFI professional fulfillment scale estimated area under the curve (AUC) was 0.81 (95% CI = 0.74–0.78). Professional fulfillment scale sensitivity and specificity for identifying physicians who indicate their quality of life is “very good” using an average-item score cut-point of 3.0 (scale range = 0 to 4) or greater was 0.73 and 0.79, respectively. Note that in this context, sensitivity refers to the portion of participants who test positive for “Quality of Life” who are identified as having “professional fulfillment.” This is different from the term “sensitivity to change” discussed elsewhere in this manuscript, which refers to the ability of a test to detect changes over time.
We also ran three separate ROC analyses for the PFI burnout composite scale (average of all burnout items, including work exhaustion and interpersonal disengagement items), with the positive state comparison set at (1) MBI indicated high emotional exhaustion or high depersonalization, (2) burnout indicated by the West et al. method using two MBI items, and (3) the burnout indicated via the single-item burnout measure . The AUC estimates for the PFI burnout scale estimated by ROC analyses with these other measures of burnout were 0.85 (95% CI = 0.81–0.90), 0.81 (95% CI = 0.76–0.87), and 0.87 (95% CI = 0.82–0.92), respectively. The PFI burnout scale sensitivity in identifying participants who are also identified as experiencing burnout by each of these three previously published methods was 0.72, 0.72, and 0.85 respectively, using an average item score cutoff point of 1.33 or greater (scale range = 0 to 4). Specificity using the same cut-point was 0.84, 0.77, and 0.76, respectively.
Table 5 demonstrates the portion of participants identified as experiencing significant burnout by the PFI burnout scales and by the three previously published methods. Table 5 also demonstrates average differences—and Cohen’s d effect size for each average difference—in self-reported medical errors and depression between participants with and without burnout identified by each of these methods. Independent sample t tests indicated that mean group differences were statistically significant with one exception; there was no significant difference in self-reported medical error between those identified as experiencing burnout compared to those identified as not experiencing burnout via the single item self-identified burnout assessment method.
There were no significant differences in the portion of house-staff (residents or fellows) and attending physicians experiencing significant burnout, which was 41 versus 37%, respectively [χ
2 (df, 1) = 0.26; p = 0.61] using the PFI burnout scale and 50 versus 48% [χ
2 (df, 1) = 0.08; p = 0.77] using the MBI. The portion house-staff and attending physicians experiencing significant professional fulfillment was 34% for both groups.
Only seven participants (< 3%) were identified by the PFI as experiencing both professional fulfillment and burnout. Of all 250 participants, 98 (39%) were identified by the PFI as experiencing burnout; 75 (30%) were identified as not experiencing burnout but also not experiencing professional fulfillment; and 77 (31%) were identified as experiencing professional fulfillment and not burnout. Figure 2 demonstrates Cohen’s d effect size (standard deviation units) differences in average WHOQOL-BREF scores between physicians who were not experiencing burnout but also not experiencing professional fulfillment compared with those who were experiencing burnout, and differences between those experiencing professional fulfillment (and not burnout)—also compared with those who were experiencing burnout.