Background

A growing number of randomized trials show that when healthcare practitioners are encouraged to enhance how they express empathy, this can reduce patient pain, [1, 2] lower patient anxiety, [3] increase patient satisfaction, [4, 5] improve medication adherence, [6, 7] and ameliorate other patient health outcomes. [8,9,10,11]. For example, Chassany’s [1] empathy training intervention for general practitioners (GPs) (n = 180) reduced pain in osteoarthritis patients (n = 842) by one point on a 10-point VAS (P < 0.0001). These modest benefits are comparable to many pharmaceutical interventions without the adverse events. Hence some authors have recently called for efforts to encourage empathic care [12].

Supporting the view that empathic care should be encouraged, the extent to which healthcare practitioners express empathy seems to be lacking in some cases, [13,14,15,16] and it may decline with time in practice [17]. The increased burden of paperwork, which takes up a quarter of practitioner time, [18] may be a barrier to empathic care. However we do not know the prevalence of inadequate empathy. If adequate empathy is rare, then patients and practitioners would both likely benefit if practitioners reinforced how they display empathy. In this study, we aimed to address this gap by conducting a systematic review of patient ratings of practitioner empathy.

An obstacle to empathy research is that practitioner empathy is difficult to define theoretically [19, 20]. At the same time there is an emerging consensus that empathy can be operationalized as a healthcare practitioner’s ability to understand a patient’s point of view, express this understanding, and make a recommendation that reflects the shared understanding [21, 22]. More importantly for present purposes, while empathy is measured using different scales, [23, 24] only one patient-rating of practitioner empathy demonstrated evidence of reliability, [25] internal validity and consistency: CARE [25, 26]. From a patient health perspective, patient ratings of practitioner empathy are likely to be important. We therefore limited our review to studies that used the CARE measure.

Objectives

Our primary objective was to measure the extent to which patients (of any type) report their healthcare practitioners (of any type) to be empathic. Our secondary objective was to compare differences in empathy ratings between different practitioner groups (male versus female, consultation times, different types of practitioners, and practitioners in different countries).

Methods

Protocol and registration

The protocol for this review was published in PROSPERO (record no. CRD42016037456). We made two changes to the protocol. In the protocol we proposed to analyze CARE scores before and after training, however there were insufficient studies to complete this analysis. We also had insufficient data to perform the proposed analyses comparing practitioners with 10 years or more experience with those who had less than 10 years experience. Neither of these changes was related to our main study aim.

Eligibility criteria

We included any study where patients rated their practitioners’ empathy using the CARE measure. We included ratings of any practitioner including nurses, doctors, alternative practitioners, and medical students. We included studies in any language, provided that the translation of the CARE questionnaire was validated.

We excluded studies that used other measures of empathy, because only CARE has been validated. An added benefit of this approach is that it reduced heterogeneity. We excluded studies where practitioners were reported to have been trained in empathy prior to being rated by patients, since we were interested in pre-training empathy ratings. Where the publications included surveys of more than one group of practitioners the surveys were treated independently.

CARE asks patients to answer 10 questions about the consultation with their practitioner such as whether the practitioner: made the patient feel at ease, really listened and understood, showed compassion, and explained things clearly (see Additional file 1). Each question can be answered by ticking one of five options: poor, fair, good, very good, excellent, does not apply, with the lowest being given a score of ‘1’, and the highest a score of ‘5’. Hence, the maximum CARE score is 50. The developers of the CARE measure have produced normative values based on administration of their questionnaire [27]. They found that the mean CARE score was 45.75, and that 5% of CARE scores fell above 48.32, and 5% fell below 40.72.

Information sources and search

We searched the following databases: MEDLINE (OvidSP) [1946–09/03/2016], Embase (OvidSP) [1974 to 2016 March 08], PsycINFO (OvidSP) [1967–09/03/2016], Cinahl (EBSCOHost), Science & Social Science Indexes (Web of Science, Thomson Reuters) [1945–09/03/2016], Cochrane Central Register of Controlled Trials [Issue 2 of 12, February 2016], Cochrance Database of Systematic Reviews [Issue 3 of 12, March 2016] and Database of Abstracts of Reviews of Effects [issue 2 of 4, April 2015] (via Cochrane Library, Wiley) and Pubmed (see Additional file 2 for search strategy). We also searched the Web of Science Core Collection, Scopus and Google Scholar for studies that have cited the CARE measure, [25] and any record that includes the full name of the measure (consultation and relational empathy). Additionally, we contacted authors of studies to ask whether they are aware of any additional studies.

Data collection, extraction, and management

After piloting the extraction sheet by two authors (JH, KM), two authors (LS, AU) independently screened all titles and abstracts and extracted data. Discrepancies were resolved with discussion by a third author (JH). We extracted data about: type of practitioner, percentage female practitioners, country, average CARE score, and individual CARE scores (where available).

We assessed risk of bias within studies by measuring response rates. It was not feasible to assess risk of bias across studies, for example by conducting a funnel plot since there was no reason to suspect higher (or lower) CARE scores varying with sample size. There was insufficient data to investigate risk of bias across studies.

Statistical analyses were performed using the program Comprehensive Meta Analysis [28]. We provided the mean and 95% confidence interval of the CARE score. We contacted study authors via email to obtain missing data with respect to participants, outcomes, or summary data. Participant data were analysed as reported. We conducted preplanned subgroup analyses to assess the extent to which proportion of female practitioners, consultation duration, type of practitioner, and country played a role. To evaluate the predictive value of gender and consultation time with respect to CARE scores we performed a multivariable regression analysis, with gender and consultation time included as the independent variables, and CARE scores included as the dependent variable.

Sensitivity and subgroup analyses

We conducted four preplanned subgroup analyses.

  1. 1.

    Longer (>10 min) consultations compared with shorter (≤ 10 min) consultations. This was based on average consultation times in UK general practice [29].

  2. 2.

    Gender: average empathy ratings of mostly (>50%) female compared with average ratings of mostly (>50%) male practitioners.

  3. 3.

    When there were at least three studies within the same country, we conducted a subgroup analysis with those three countries, and compared it with the complement. We chose three studies because fewer than three makes meta-analysis problematic and increases the likelihood of basing conclusions on anomalous results.

  4. 4.

    Types of practitioners (physicians, medical students, alternative practitioners, etc.). If there were at least three studies that measured patient ratings of specific types of practitioners, we conducted a subgroup analysis of this group, and compared it with the complement.

Results

Main results

Our search yielded 392 independent records, of which 69 studies met our inclusion criteria (see Supplemental Material). Of these, 64 independent study groups (within 51 publications) had sufficient data to be included in our meta-analysis (see Table 1, Fig. 1, Additional file 3). See Additional file 4 for excluded studies.

Table 1 Study groups included in meta-analysis (n = 64 published in 51 articles)
Fig. 1
figure 1

PRISMA Flow diagram

The 64 study groups were from 15 different countries: UK (n = 23), USA (n = 6), Hong Kong (n = 9), Germany (n = 7), Australia (n = 4), China (n = 6), Ethiopia (n = 2), South Korea (n = 2), and one study from each of Brazil, Croatia, France, India, and Japan. The types of practitioners included primary care physicians, practitioners of Traditional Chinese Medicine (TCM), medical students, allied health professionals, and other specialists.

The average CARE score for the 64 study groups was 40.48 (95% CI, 39.24 to 41.72) (see Table 2, Fig. 2). Twenty-two studies reported consultation times. Longer consultations (≥10 min; n = 13) scored higher (42.60, 95% CI 40.69 to 44.52) than shorter (<10 min; n = 9) consultations (34.93, 95% CI 32.66 to 37.21). This difference of 7.67 points (15%) between longer and shorter consultations was highly significant (P < 0.001). Twelve studies provided data on the gender of practitioners (Table 2). Studies with predominantly female practitioners (n = 6) showed higher empathy scores (42.77, 95% CI 38.98 to 46.56) than those with predominantly male practitioners (n = 6, 34.85, 95% CI 30.98 to 38.71). This difference of 7.92 points (16%) was statistically significant (P = 0.004).

Table 2 Summary of results from subgroup analyses
Fig. 2
figure 2

Comparison of average CARE score within subgroups

Fifty-five study groups could be included in the pre-planned subgroup analysis by country (Table 2). Highest empathy scores were found in Australia (n = 4, 44.88, 95% CI 42.63 to 47.14), USA (n = 6, 44.56, 95% CI 42.71 to 46.40) and UK (n = 23, 43.07, 95% CI 42.11 to 44.04). Scores were lowest in Hong Kong (n = 9, 33.46, 95% CI 31.94 to 34.99). Scores in Germany (n = 7, 40.72, 95% CI 39.02 to 42.44) and China (n = 6, 40.61, 95% CI 38.68 to 42.55) were in-between. We added an exploratory analysis by country including all 64 study groups and found that scores in India (n = 1, 29.49, 95% CI 24.18 to 34.80) were lower than those in Hong Kong. Scores in the UK, USA and Australia were highest (See Additional file 5).

We found at least three studies each measured empathy in the following types of providers: physicians, medical students, allied health professionals, and practitioners of Traditional Chinese Medicine (Table 2). There was statistically significant heterogeneity between these (P = 0.032), with allied health professionals scoring the highest (n = 5, 45.29, 95% CI 41.38 to 49.20), and physicians scoring the lowest (n = 39, 39.68, 95% CI 38.29 to 41.08). We found no differences between primary care physicians, specialists, and complementary and alternative medicine (CAM) providers, (P = 0.386) (see Table 3).

Table 3 CARE scores by physician specialty

A multivariable regression analysis was performed to analyze the predictive value of gender and consultation time with respect to CARE scores. Consultation duration was the only significant predictor for CARE scores (Table 4).

Table 4 Multivariable regression analysis, with proportion of female practitioners and consultation time as independent variables and CARE scores as dependent variable (n = 8)

Risk of bias

The response rate was reported in 20 of the 53 studies (38%), with the average rate being high (69%, ranging from 21% to 100%). The uncertainty about the remaining response rates entails a risk of response bias.

Discussion

We found that patient rating of practitioner empathy is highly variable, with some practitioners being reported to express empathy much less effectively to patients than others. Female practitioners, allied health professionals, those who spend more time with patients, and practitioners from Australia, the US, and the UK seem to display empathy more effectively than other practitioners. In addition, the average care score we identified was low in comparison with normative values, falling in the lowest 5% of CARE scores measured by the developers of the questionnaire [27]. The highly variable scores we found are likely to be associated with variable patient outcomes [9,10,11, 30].

Strengths and limitations

This is the first systematic review to investigate the extent to which healthcare practitioners are empathic. Another strength is that it used measures of the only validated patient-rated measure of practitioner empathy. As such, it provides a good indication of the differences between perceived empathy across gender, disciplines, and countries.

There are also several potential limitations. First, our method for measuring the difference between female and male practitioners was likely to be an underestimate. If studies with majority female practitioners resulted in greater patient-rated empathy, it is reasonable to assume that if all the practitioners were female, the difference between male and female practitioners would have been greater. In the context of this observational research we do not know whether the additional time caused female practitioners to be more empathic, or whether female practitioners’ higher empathy caused them to spend more time with patients, or whether these two factors cannot be separated. Second, response bias [26, 31, 32] could have affected the results. Patients who know they are rating their practitioners may wish to please their practitioners, [33] for example by giving them higher scores than they otherwise would [31, 32]. The lack of response rate reporting in most of the studies makes the extent of this problem unclear. Furthermore, selection bias might have influenced the results: the CARE questionnaire could be delivered in areas where the empathy of the practitioners is believed to be anomalous (either particularly high or particularly low). Next, the comparison between countries could have been influenced by the number of studies per country. Specifically, some of the countries with low scores had very few studies (Croatia had 1, Ethiopia had 2, and India had 1). Moreover in spite of validation of CARE translations, patients in different countries may have divergent prior expectations and beliefs about what it means to be an empathic practitioner. Finally, the comparison with normative values (resulting in the average score we found being in the lowest 5%) is problematic. In spite of being relatively low, the average score is still above 40. Further work needs to be done to investigate the meaning of average CARE scores.

Conclusions

Implications for clinical practice and clinical research

The way different healthcare practitioners express empathy to patients is low (on average) in comparison with normative scores, and highly variable. Given the likely association between practitioner empathy and patient outcomes, further research is now warranted to investigate how these findings can be used to improve patient care. Future reports of the CARE questionnaire should include all the potentially relevant factors we have identified here, especially details about response rates, and also consultation duration, gender, experience of practitioners, and other demographic details of patient raters and practitioners.