Data Sources and Searches
We followed guidelines provided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA).30 A systematic literature search of Ovid MEDLINE, PsycINFO, Web of Science, CINAHL, and ProQuest Dissertations & Theses was conducted to identify all studies involving health professionals, burnout, and quality of care through March 2015. The full electronic search string used for PsycINFO included the following: ((DE “occupational stress”) AND (SU “burnout”)) AND ((DE “quality of care”) OR (DE “quality of services”) OR (DE “satisfaction”) OR (DE “client satisfaction”) OR (DE “safety”) OR (DE “perceived quality”)). Similar search strategies were used for the other databases. All search strategies and the coding protocol are available from the authors.
We included published and unpublished studies of any design (e.g., cross-sectional surveys, intervention studies), as long as empirical data was used to assess the relationship between burnout and quality (including patient satisfaction) and/or safety; if these variables were assessed but the bivariate association was not reported, the authors were contacted to gather additional data for analyses. Attempts were made to contact 63 authors, of whom 21 provided usable data, 6 responded that data did not meet inclusion criteria or could not be obtained, 3 could not be located, and 33 did not respond. Although review articles were not included in our analyses, we examined reference sections of review articles to identify primary studies for inclusion.
We retained articles that specifically examined burnout. The Maslach31 three-dimensional scale of burnout (emotional exhaustion, depersonalization or cynicism, and reduced personal accomplishment) was used most often, although any study measuring at least one dimension of burnout or a global burnout score was included. We focused on healthcare providers and excluded studies of burnout in other occupations (e.g., education, probation officers, vocational rehabilitation). We categorized quality of care along two dimensions: perceived quality (rating scales or items reflecting provider’s perception, patient satisfaction) and safety (perceived safety, adverse events, “near misses,” medical errors). Included studies are briefly summarized in Table 1 of the supplemental online material.
Data Extraction and Quality Assessment
Articles were coded independently by a pair of coders (from a group of six coders comprising a clinical psychologist and five doctoral students). To maintain consistency and ensure reliability, coders met to review and come to consensus for each independent sample. We extracted information on burnout type and measure(s) used, quality and safety indicators, provider type (nurses, physicians, interdisciplinary), setting (outpatient, inpatient, or mixed inpatient/outpatient), and country (coded by region: North America, South America, Europe, Asia, Australia, Africa). Where available, we coded provider characteristics (age, gender, experience/length of time in the field) and patient characteristics (age, gender). We extracted information on potential methods-related moderators including study year, unit of analysis (individual, dyad, service unit, hospital/organization), and quality or safety data source (provider, patient, observer, medical records).
We rated the quality of each study to account for bias in individual studies (see Table 2 of the supplemental online material). Because quality rubrics commonly recommended for meta-analyses32
33 include items not relevant for correlational designs (e.g., blinding, allocation of intervention), we created items based on common sources of bias in observational studies.34
35 We tested and refined the initial rating system on several studies before rating the full sample. Two raters independently coded each study; disagreements were resolved through discussion. Measures of central tendency highlighted the presence of eight items as a potentially valuable cutpoint (mean = 8.12, median and mode = 8). Following other meta-analyses that examined subgroups based on quality ratings,36
38 we used quality rating as a moderator, examining the effect sizes for those with high quality (8 or above) compared to effect sizes of studies scoring below 8.
Data Synthesis and Analysis
We extracted effect size information at the level of burnout and quality (or safety) relationship. All associations were first converted into Pearson’s correlations; Fisher’s Z-transformation was conducted to adjust for the non-normal distribution of Pearson’s r. When a study reported multiple measures of the same construct, we averaged the effect sizes and weighted them by sample size in order to maintain statistical independence.39 We calculated an overall relationship, with one effect size per independent sample, to describe the relationships between burnout and perceived quality and safety. We conducted separate meta-analyses to examine the relationships aggregated at the level of predictor (burnout type) and aggregated at the level of the quality indicator (perceived quality and safety). We conducted moderator analyses for perceived quality and safety.
We used a random effects model to calculate the mean effect sizes using Comprehensive Meta-Analysis (CMA) software, version 2.40 At the aggregate level, Z-scores and confidence intervals were examined to determine the statistical significance of each association. The strength of the mean effect sizes were interpreted in light of Cohen’s41 recommendation for correlations, where 0.10 is small, 0.30 is medium, and 0.50 is large. We conducted one-study-removed sensitivity analyses to determine whether any single sample unduly influenced the results (indicating a potential outlier); because the point estimate of the mean effect size did not change substantially upon removal of any study, we performed the remainder of the analyses with the full sample.
We examined heterogeneity with the Q-statistic and the I
2 index; a significant Q-statistic informs whether moderation may be present, and the I
2 index informs the extent of the heterogeneity, ranging from 0 to 100 %, with higher values indicating greater heterogeneity.42
44 Although I
2 is of value in determining the need for moderation analyses, it does not speak to the source of heterogeneity or dispersion of effects.45 We used I
2 values of 25 % or more as a cutoff to examine the presence of moderators, as this suggests that between-study variability in effect size is greater than expected by chance.43 To document dispersion of effects, we report 95 % confidence intervals for each effect size.
We tested study-level moderators for both quality and safety, including year, type of report, provider type and setting, region, and quality of study. Additional moderators for burnout and perceived quality included burnout type, quality source, and unit of analysis. Additional moderator analysis for safety compared perceived safety (e.g., questionnaires) versus events (e.g., reported adverse events, near misses). For categorical moderators, we used an analysis of variance (ANOVA) analog. To test continuous moderator variables, we conducted random effects meta-regressions using unrestricted maximum likelihood estimation. Because meta-regressions use list-wise deletion, each moderator was examined independently to maximize the number of studies included in the analysis. Continuous moderators were considered significant if beta weights were significant and I
2 decreased. We interpreted statistical tests at p < 0.05. All moderator analyses were conducted in CMA, version 2.40
Finally, we assessed the potential influence of publication bias by examining funnel plots and testing for asymmetry using Egger’s46 regression approach. Although Egger’s test may be prone to bias in low-power situations, our sample size was well beyond the recommended minimum of ten samples.47 In addition, Failsafe N was not appropriate because of the high level of study heterogeneity and the random effects model used.39