Background

With the growing incidence of cancer in an aging population, an increasing number of advanced cancer patients are not able to report their symptoms and many depend upon health care providers for symptom assessment [1, 2]. Accurate evaluation of symptom intensities is critical for optimal care and ultimately for alleviating symptom burden and improving the quality of life of cancer patients. An inappropriate interpretation of symptoms may lead to overdose of medication (e.g. of an opioid), or conversely may leave the patients undertreated [3, 4]. There are several tools for assessment of symptoms in oncology [5, 6]. Nevertheless, clinicians have called attention to the lack of consensus about a brief, reliable, bedside symptom assessment tool [7]. The absence of such a tool may be an obstacle for routine assessment, optimal symptom control and ultimately for improvement of quality of life in patients with cancer [3, 5, 810].

Patient ratings of symptoms are generally considered the "gold standard" [11, 12]. However, sometimes patients are unable to report on their symptoms, e.g. when they suffer from confusion [13] or communication deficits [11], when the symptom distress is severe [11], or if physical or cognitive disability make them unable to complete assessments [4]. These patients are often excluded from studies and there is no consensus about how their symptoms should be assessed in order to guide clinical decisions. For patients unable to report their symptoms we need to rely on proxy assessments. The usefulness of such proxy reports is dependent on their agreement with patients' report of symptom intensity.

Studies on agreement between patient reports and proxy observations have been performed in a wide range of patient populations [1416], and in different settings, including palliative care [3, 4, 1729]. Studies conducted among cancer patients and their health care providers found a poor and variable agreement between patients and staff [3, 1719, 22, 24, 25]. Some reports have shown that providers tend to underestimate the intensity of physical symptoms, such as pain, and overestimate anxiety, depression and distress [14].

The findings of studies investigating agreement on single symptoms and exploring which clinical factors that cause lack of agreement between self-reports and observer ratings are inconsistent [18, 2830]. Patients' age, gender, tumor site, place of evaluation, Karnofsky performance status, time from admittance into palliative care and time from diagnosis are among the factors studied previously [18, 2830]. Most studies included a limited number of patients, focused on only a few symptoms related to quality of life and were performed in a single department. To use provider reports of patients' symptoms to guide clinical decisions, we need to know which symptoms that can be reliably reported by health care providers and whether certain subgroups of patients are at risk of symptom over- or underestimation by providers.

Therefore, a large-scale study including patients treated with opioids for cancer pain from 11 different European countries was conducted. One of the objectives of the study was to examine the extent of agreement between patient and health care provider assessments, including the association with demographic- and disease-related factors.

Methods

Patients

Patients treated in 17 different palliative care centers, outpatient clinics, general or cancer wards in 11 countries were eligible for The European Pharmacogenetic Opioid Study (EPOS), a multicenter, multinational study performed between February 2004 and April 2008. EPOS was designed to study symptoms experienced by cancer pain patients and the pharmacogenetics of opioids in these patients. The examination of agreement on symptom assessments between patients and health care providers was one of the pre-specified objectives of the study. All patients considered for inclusion were 18 years of age or older, had a verified diagnosis of malignant disease and had used a regularly scheduled opioid treatment corresponding to step III at the World Health Organization's analgesic ladder for cancer pain for at least three days. Patients could only participate in the study once. The exclusion criterion was insufficient command of the language spoken in the study center. Patients were informed of the study by their health care provider and gave written informed consent. The study was performed according to the rules of the Helsinki-declaration and approved by each country's or study center's ethical committee.

Symptom assessments

Patients and health care providers (nurses or physicians) assessed symptoms independently on the same day. The symptoms assessed were pain, fatigue, generalized weakness, anorexia, depression, constipation, sleep disturbance, dyspnea, nausea, vomiting and diarrhea. The health care providers registered symptom severities during the past 24 hours on a four-point verbal rating scale with the descriptors none, mild, moderate and severe. Patients reported their symptoms during the past week by answering the European Organization of Research and Treatment of Cancer Core Quality of Life Questionnaire (EORTC QLQ-C30) version 3 [31]. Symptoms were assessed on a four-point verbal rating scale by the descriptors "not at all", "a little", "quite a bit" and "very much". All symptoms, except fatigue were assessed by single items of the EORTC QLQ-C30. Fatigue was assessed with a scale ranging from zero to 100 including three items: Did you need to rest?, Have you felt weak? and Were you tired?. None of these items were identical with the fatigue item providers responded to. Therefore, the entire fatigue scale was recoded into a four-point scale with increasing symptom intensity from 1 to 4: 0-24 was recoded into one, 25-49 was recoded into two, 50-74 was recoded into three and 75-100 was recoded into four.

Factors associated with agreement

The possible association between patient-provider agreement and several demographic- and disease-related factors such as age, gender, Karnofsky Performance Status, tumor site, time since diagnosis and affiliation to department is incompletely understood [18, 2830]. The existing literature suggest that agreement might be influenced by such factors (i.e. less agreement as performance status worsens [28], in male patients [29], in younger patients [30], in hospitalized patients [30], in patients with certain cancer diagnoses [30]) but the findings are inconsistent. Therefore, it was decided to perform exploratory analyses of demographic-and disease-related factors thought to be associated with patient-provider agreement based on existing literature and clinical experience. The patient characteristics age, gender, body mass index (BMI), previous or ongoing abuse of alcohol or drugs (yes or no), cancer diagnosis and presence of metastasis were registered by a health care provider. Use of medical treatment including opioids and chemotherapy was obtained from the patients' medical records. The health care providers also noted the time since diagnosis in months, time since start of opioid treatment in months, status of present opioid treatment (as recently initiated and still under titration or as stable dosing) and whether the patient was treated as an inpatient or outpatient. Health care providers assessed cognitive function by performing the Mini Mental State (MMS) examination, scoring the patient between zero and 30, where scores of 23 or less indicated cognitive failure [13, 32, 33]. In addition, health care providers assessed functional status by the Karnofsky Performance Status, scoring the patient between zero and 100 [34].

Statistics

Collection and organization of the data were performed by the Pain and Palliation Research Group, Norwegian University of Science and Technology. The statistical software SPSS for Windows version 16.0 was used for all statistical analyses. If data were missing from patients or health care providers on a single symptom, these data were eliminated from analyses involving that variable. No imputations were performed.

Some patients did not report their own symptoms. The definition of not self-reporting was set as answering less than half (< 15) of the questions of the EORTC QLQ-C30. Demographics were reported as absolute numbers, percent, mean and standard deviation (SD). Mann-Whitney U and Fisher's exact tests (for 2 × 2-tables) were used to compare respondents and non-respondents.

The level of agreement between patient- and health care provider assessments was examined by four different approaches. First, agreement at the group level was addressed by the Wilcoxon Signed-Rank test comparing intensity of symptoms as assessed by patients and health care providers. Second, difference scores for each symptom (difference score = health care provider score minus patient score) were calculated to examine good agreement (difference score within ± 1), underestimation (difference score ≤ -2) and overestimation (difference score ≥ 2) by health care providers, compared to patient scores which were considered as the gold standard. Variants of this approach have been used in several previous studies [16, 19, 2729].

Third, as a measure of agreement on symptom scores between patients and health care providers at the individual level, intraclass correlation coefficients (ICCs) were computed by a two-way mixed effect model and an absolute agreement definition. ICCs are reported with 95 percent confidence intervals and serve as indicators of chance-corrected agreement at the individual level [35]. Guidelines used for interpretation of ICCs were based on studies demonstrating that for ordinal data ICCs are mathematically equivalent to the weighted kappa statistic [36, 37] and the ranges used in previous studies of cancer patient and proxy agreement [17, 27, 38]: ≤ 0.40 poor to fair agreement, 0.41-0.60 moderate agreement, 0.61-0.80 good agreement and 0.81-1.00 excellent agreement. Fourth, data was examined visually by plotting the differences between the two measures (difference score = health care provider score minus patient score) against their individual means in Bland-Altman plots. These plots are useful for evaluating whether there is any systematic difference between the methods or whether the degree of random variation changes with the mean value [39].

To address whether demographic- and disease-related factors were associated with the agreement between patients and providers, analyses on how these influenced the number and percentage of comparisons where there was good agreement (difference score within ± 1), where the health care provider underestimated (difference score ≤ -2) and overestimated (difference score ≥ 2) were performed [19, 27]. Chi-Square tests were used to investigate whether agreement was significantly associated with these previously mentioned variables.

Results

Patients

The EPOS included 2294 patients. Their mean age was 62 (SD 12) years. The mean Karnofsky Performance score of this patient population was 59 (SD 17), meaning that most patients required some help, but could take care of most personal requirements [34]. The mean total Mini Mental State (MMS)-score was 27 (SD 3). On average it was 32 months (SD 46) since the cancer was diagnosed and five months (SD 11) since opioid treatment started. Men (52%) and women (48%) were equally represented in the study. The majority was hospitalized (81%), Caucasian (97%) patients with one or more metastases (83%). Cancer diagnoses, site of metastases and countries are given in Table 1.

Table 1 Demographics of patients

Three hundred and fifty-six patients answered less than 15 questions in the EORTC QLQ-C30 and were categorized as non-respondents. In addition observer rating of symptoms was missing for five patients, meaning that this study yielded 1933 patient and proxy assessment dyads. In general, patients not giving a self-report of symptoms were older, had lower Karnofsky Performance Status, lower scores on MMS, were more recently diagnosed with cancer, opioid treatment was more recently initiated and they were more often hospitalized as compared to those who completed the assessment form (Table 1). The most common reason for not completing all assessments was that the patients were too ill.

Agreement between patients and health care providers

Nurses assessed the symptoms of 994 patients (51%), physicians assessed the symptoms of 735 patients (38%) and data about the health care provider profession was missing for 204 patients (11%). Health care providers systematically reported the percentages of moderate or severe symptoms as lower than the patients did. The percentages of patients with symptoms assessed as moderate or severe by patients and providers respectively, were for pain 67 and 47, for fatigue 71 and 54, for generalized weakness 65 and 47, for anorexia 47 and 25, for depression 31 and 17, for constipation 45 and 30, for poor sleep 32 and 21, for dyspnea 30 and 16, for nausea 27 and 14, for vomiting 14 and 6 and for diarrhea 14 and 6 (Table 2). Health care providers underestimated symptom intensity at the group level (p < 0.001 for all symptoms) (Table 2).

Table 2 Prevalence and intensity of symptoms as rated by patients and health care providers

Direct under- and overestimations were calculated on the basis of difference scores. Again, underestimation of symptoms by health care providers was far more common than overestimation (Table 3). For instance anorexia was underestimated in 20 percent of assessment-pairs and overestimated in two percent. In a majority of patient-provider assessment pairs (79 to 93 percent) the responses were identical or differed by only one response category. The highest levels of agreement were found for pain, vomiting and diarrhea, where 90 percent of assessment-pairs showed complete agreement or differed by only one response category. Fatigue, anorexia and constipation were the symptoms most frequently underestimated by health care providers.

Table 3 Under- and overestimation of symptoms by health care providers as compared to patients

At the individual level, the patient-provider agreement was evaluated for each symptom by the ICC (Table 4). Agreement for anorexia was poor to fair (ICC < 0.4), whereas the ICC of all other symptoms was of a moderate magnitude (ICC 0.4-0.6). The individual differences between the two assessments (difference score = health care provider score minus patient score) were assigned as the ordinate (y-axis) and the individual means as the abscissa (x-axis), in Bland-Altman plots (Figure 1 and Figure 2). The size of the markers reflect the number of individual observations and only the line of equality (difference = 0) is shown. The Bland-Altman plot of pain showed that the best agreement was found at intermediate levels of symptom intensity (Figure 1a). The Bland-Altman plots of fatigue and generalized weakness demonstrated increasing agreement with increasing symptom intensity (Figure 1b and 2a). For constipation, anorexia, depression and poor sleep (Figure 1c, 2b, c and 2d) the Bland-Altman plots showed a fair agreement at all levels of symptom intensities, with the majority of agreement-pairs within ± 1 difference scores. For the symptoms that were less frequent, like vomiting, dyspnea, nausea and diarrhea (Figure 1d, 2e, f and 2g) most patients and health care providers agreed on absence of the symptom.

Table 4 Difference scores and intraclass correlation coefficients for patient-provider-pairs of symptom severity scores
Figure 1
figure 1

Bland-Altman plots, one for each symptom (see also Figure 2). The difference between patient and provider score (difference score = health care provider score minus patient score) plotted against mean symptom score. The size of markers reflect the number of individual observations and only the line of equality (difference = 0) is shown. Negative differences mean that providers underestimated the symptom. The larger the size of the markers at one side of the line of equality, the larger was the tendency of a systematic difference between assessments (i.e. more observations below the line suggest that providers had a negative bias and underestimated symptom intensity). Whether differences between provider and patient assessments changes with the mean value of symptom intensity is determined by looking for patterns along the x-axis. (A): Pain. (B): Fatigue. (C): Constipation. (D): Vomiting.

Figure 2
figure 2

Bland-Altman plots, one for each symptom (see also Figure 1). The difference between patient and provider score (difference score = health care provider score minus patient score) plotted against mean symptom score. The size of markers reflect the number of individual observations and only the line of equality (difference = 0) is shown. Negative differences mean that providers underestimated the symptom. The larger the size of the markers at one side of the line of equality, the larger was the tendency of a systematic difference between assessments (i.e. more observations below the line suggest that providers had a negative bias and underestimated symptom intensity). Whether differences between provider and patient assessments changes with the mean value of symptom intensity is determined by looking for patterns along the x-axis. (A): Generalized weakness. (B): Anorexia. (C): Depression. (D): Poor sleep. (E): Dyspnea. (F): Nausea. (G): Diarrhea.

Factors associated with symptom agreement

The 1933 patient-provider dyads assessing 11 symptoms yielded 21 263 paired symptom observations. The total number of dyads differed somewhat due to missing data (Table 5). For all symptom assessment pairs, the agreement between patients and their health care provider was not associated with the patients' age, gender, BMI, previous abuse of alcohol, use of chemotherapy during the past 24 hours or presence of metastatic disease (Table 5). Karnofsky Performance Status had a substantial effect on agreement. Large patient-provider discrepancies were more prevalent in patients with a very low (≤ 25) or a moderately impaired (51 to 75) performance status. Higher MMS-scores (> 23) were associated with more underestimation of symptom intensity by health care providers as compared to patients with lower MMS-scores (≤ 23). Health care providers more often underestimated symptoms in recently diagnosed patients and patients that received a recently initiated opioid still being titrated. Patient-provider agreement was better among outpatients than hospitalized patients. Overestimation of symptoms by health care providers was more common in patients with a history of drug abuse (3%) than in patients with no such history (1%). Both under- and overestimation was more common when the symptoms were assessed by physicians (11% and 3% respectively), as compared to assessment by nurses (10% and 2% respectively). The frequency of underestimation varied between cancer diagnoses and countries (Table 5).

Table 5 Factors associated with agreement across all pairs of symptom observations (N = 21263)a

Discussion

In this large, European cross-sectional study a moderate agreement on symptom assessment between cancer patients and their health care providers was observed (ICC range 0.38 to 0.59). Health care providers underestimated symptom intensity in approximately 10% of patients, with some variations between cancer diagnoses and substantial variations between countries. Patients with low Karnofsky Performance Status, MMS-scores of 24 or higher, who were hospitalized, recently diagnosed or still undergoing opioid titration, were at increased risk of symptom underestimation by health care providers.

Our examinations of the agreement between patient and provider assessments showed that health care providers tended to underestimate all symptoms and that underestimation was present in 6.5 to 19.4% of the patient-provider assessment pairs. The highest rates of underestimation were found for anorexia (19.4%) and fatigue (13.4%), which were among the most "subjective" symptoms investigated. As similar proportions of underestimation have been demonstrated previously [15, 18, 28, 29], influence from subjective terminology and recoding of the fatigue scale was considered as less likely and the findings may rather reflect the known trend towards better agreement when the information assessed is observable and concrete [14]. This was also illustrated by the low level of underestimation (6.5%) for vomiting. Agreement at the individual level was only moderate as measured by ICCs ranging between 0.38 and 0.59. The ICC, which is a suitable statistical measure of agreement correcting for the chance-expected agreement, and the presentation of agreement as absolute numbers and percentages in Table 3 are complementary approaches as they differ both in perspective and operationalization [27].

The present findings of provider underestimation and moderate ICCs were in contrast to findings in primary care where general practitioners and nurses overestimated symptoms and ICCs were higher [19]. Also, in a review of health care providers' role for evaluation of patients with chronic diseases including cancer, Sneeuw et al. reported that providers tended to rate patients as having more symptomatology than the patients did themselves [15]. However, the quality of the studies comparing patient and provider assessments was assessed as rather poor [15]. Other studies have reported that health care providers tend to underestimate physical symptoms [4, 20, 23, 24, 28], whereas anxiety, depression and distress of symptoms are overestimated [14]. In the present study there was no directional difference between agreement on physical and psychological symptoms. However, the Bland-Altman plots of agreement by level of symptom intensity showed differences between symptoms. For fatigue and generalized weakness, the agreement increased at higher symptom intensities, whereas the agreement on less prevalent symptoms was best at low symptom intensities. The latter finding was in line with observations in primary care where symptom agreement was best for absent symptoms [19]. This argues for that the differences in agreements between studies are likely to be influenced by the number of patients with moderate or severe symptoms included in the study.

Symptom underestimation by health care providers was associated with low Karnofsky Performance Status, high MMS-score, hospitalization, recently diagnosed cancer and ongoing opioid titration. Of the previous studies examining possible factors associated with patient-provider agreement one study found no associations [18], and three studies found better agreement in subgroups of patients characterized by certain demographic- and disease-related factors [2830]. However, the findings were not conclusive because the analyses were dispersive, the findings were difficult to explain clinically and there was a lack of consistency. The significant relationship between Karnofsky Performance Status and agreement found in the present study was in line with the existing literature, but there was no linear [28, 29] or U-shaped [11, 27] correlations as described previously. To our knowledge, an association with MMS-score has not been documented before. The association might reflect that patients with low MMS-scores are identified as generally more influenced by disease and therefore the observer ratings agreed more closely with the patient ratings. The status of patients hospitalized, recently diagnosed or undergoing opioid titration can at a group level be considered as fluctuating. Therefore, the finding of more discrepancies between patient and provider assessments in these patients was not surprising. This finding is also in line with a previous study showing less agreement in hospitalized patients [30]. Based upon these results the presence of such factors could alert clinicians to recognize that health care providers may underestimate symptoms in these patients. The factors associated with agreement were broadly similar to those characterizing the patients not able to complete the EORTC questionnaire, possibly indicating that patients unable to give self-reports are at risk of underestimation of symptoms by health care providers.

Drug abuse and cancer diagnosis were also significantly related to agreement, but the magnitude of differences was considered as not clinically important. As seen in previous studies, the nurse assessments agreed more closely with patient ratings than physician assessments [4, 19]. The substantial variations in agreement observed between countries might be of relevance, reflecting that factors influencing either patient- or provider reports may vary between countries. However, the study design does not allow us to reach a firm conclusion if this variation is truly related to country or if it also related to each specific center.

For the analyses of agreement, the patient ratings were used as a "gold standard to which the health care provider assessments were compared. In reviews, the patients' role as a gold standard for assessment has been questioned [15, 40]. A study comparing responses from patients, physicians, nurses and significant others found that deviant scores of more than one response category were most often caused by the patients, indicating either that the quality of all proxy-derived information was poor or that patients' responses are of questionable validity [27]. Patient reports may not reflect the true experience of symptoms for various reasons, such as psychological denial, barriers towards reporting symptoms, a wish to please the nurse/doctor or an impression that emphasizing symptoms is needed to secure help from health care providers. Therefore, to use a combination of patient reports (when available), the caregiver's perceptions and objective signs could be of benefit both in clinical decisions and future research [4, 7].

The findings of this study might have direct clinical implications for the care of cancer patients. Underestimation of symptoms by health care providers in cancer care might cause undertreatment of symptoms and less favorable outcome. Insight into the limitations of observer rating may improve the health care providers' ability to identify symptoms that need treatment. To increase rates of agreement, systematic screening tools used regularly or programs for further training of communication skills could be introduced [41]. Future research could address the consequences of symptom underestimation in cancer care and the importance of interventions to reduce disagreement between patients and providers. When health care provider assessments replace patient rating in those not able to report their symptoms, the identification of risk factors for erroneous assessments by health care providers is important [24]. Our findings of factors influencing agreement may be elaborated in future studies investigating whether agreement can be improved by adjusting for systematic deviations between patient and health care provider assessments in such situations.

This is to our knowledge the largest study comparing the assessment of individual symptoms between patients and health care providers in cancer care across several countries. As in other studies among advanced cancer patients, self-assessment was not possible to obtain for a considerable number of patients (N = 365). The patients not able to report symptoms had similar characteristics as those who had their symptoms underestimated by providers. Therefore, the true number of patients being underestimated is not likely to be falsely overestimated due to missing values. Patients rated their symptoms in the extensively validated EORTC QLQ-C30, whereas health care providers rated the same symptoms using a clinically applicable four-point verbal rating scale. Patients reported their symptoms during the past week, whereas health care providers assessed symptoms during the past 24 hours, a difference in time frame that might introduce differences in both prevalence and intensity of reported symptoms. Thus, it could be argued that patient- and provider-instruments differed. However, to compare the actual assessments currently performed in clinical research it was decided to apply the instrument used in most studies on self-reported symptoms in cancer care (EORTC QLQ-C30) and the bedside observer assessment tool used in recent European Association for Palliative Care (EAPC) endorsed studies [42]. The present study compared assessments by patients and providers and was not designed to compare assessments performed by subgroups of health care providers (nurses versus physicians). Future studies needs to be designed specifically to investigate differences between nurses and physicians in terms of ability to assess patient symptoms as the findings of previous studies are inconsistent [4, 19, 29]. Furthermore, studies in samples of other racial/ethnic composition and studies including patients at different levels of the disease-trajectory are needed to determine the generalizability of the findings of the present study.

We recognize that this study have some limitations. In the analyses to identify risk factors of disagreement between patients and providers, pairs of assessments for all symptoms were pooled, to avoid multiple testing. This strategy did not allow us to address whether the factors associated differently with each symptom. Furthermore, data from each center on numbers and characteristics of patients not approached or declining to take part in the study was not obtained. However, the characteristics of patients included were found to be representative for cancer patients. The pooling of data across countries may be seen as a limitation as there were substantial variations in agreement between countries, but it may also be a strength as it increase the sample size and protects against the tendency towards report of lower overall levels of patient-proxy agreement shown in smaller studies [15]. The selection of centers in each country was based upon researchers volunteering to participate in this European multi-center study. These centers are not necessarily representative for each country's general health care system. In order to accurately describe the influence of country, we believe a study should include several randomly selected centers from each country in the analysis. Obviously, this was not done in our study and therefore, we believe that the data about country effect (Table 5) should be interpreted with caution and hence not included in more comprehensive analyses.

Conclusions

In this large European study, health care providers assessing cancer patients' symptoms tended to underestimate symptom intensity at the group level and the agreement with patient ratings at the individual level was moderate. The differences between patient and provider assessments can be caused by providers not being able to exactly interpret the patients' symptoms or that different instruments are used for patients and health care providers. Agreement on rating of symptoms was associated with demographic- and disease-related factors. Clinicians involved in care for patients with cancer should be aware of the potential factors associated with a risk of symptom underestimation.