Background

Patient-reported outcomes (PROs) are increasingly important in oncology. The Medicare Health Outcomes Survey (MHOS) collects health-related quality of life (HRQOL) and other PRO information from respondents, which is used in the calculation of publicly reported health plan star ratings [1]. Additionally, the National Quality Forum is working to develop PRO performance measures [2]. In oncology, PRO performance measures assessing symptom management processes and outcomes are being actively explored [3, 4].

A major challenge associated with PROs is that patients may be too ill to complete questionnaires. In diseases such as cancer, severely ill patients may comprise a large portion of the study population. Evaluation of the potential for proxy reporters to answer on the patient’s behalf in cancer was primarily conducted using paired studies that include data from proxy-patient dyads [5]. The findings from paired studies evaluating proxy-patient differences and correlations in cancer suggest that proxy-patient differences tend to be small on average and correlations are at least moderate [6,7,8]. It is not known how well the findings from paired proxy-patient studies will map to unpaired studies, as patients in the latter group are unable to complete questionnaires and thus may differ in important, systematic ways from patients who are able to complete questionnaires. Furthermore, proxies will only be used in practice if patients are unable to complete questionnaires, and thus understanding the differences in this practical context is important.

This question takes on greater importance in light of the many surveys used for public reporting and health policy decision-making that employ proxies to report on behalf of an otherwise missing patient and evaluate the patient’s health status and HRQOL. This includes the Behavioral Risk Factor Surveillance System (BRFSS) [9], MHOS [10, 11], and the National Health Interview Survey (NHIS) [12, 13]. Results from survey populations using proxies to substitute for otherwise missing patients have found that proxies underestimate the prevalence of disease [14, 15] and disability [12], although casemix adjustment may be able to reduce this bias in some cases [13].

To date, few studies in cancer have evaluated the differences between proxy and patient reports in unpaired studies, where proxies are more likely to be needed and used. It is also important to determine whether any differences found can be reduced through adjustment for clinical and sociodemographic characteristics, as such characteristics are frequently collected in surveys. Because HRQOL is an important outcome in cancer, particularly advanced cancer, and a high rate of proxy use can be anticipated due to the nature of disease and treatment, understanding the size of the proxy-patient difference and the potential for mitigating it using routinely collected data is important. We therefore evaluated 1) the size and direction of the difference between proxy and patient reports of patient HRQOL in a large, population-based representative survey of cancer patients; and 2) whether this difference was affected by adjustment for frequently used sociodemographic and clinical covariates. We hypothesized that the use of proxies would be driven by poor patient health, and that proxy HRQOL responses would therefore be consistently lower than patient responses.

Methods

Study setting, participants and data sources

The Cancer Care Outcomes Research and Surveillance (CanCORS) study is a large, clinically and demographically representative [16] study of patients with incident lung or colorectal cancer. CanCORS evaluated a number of PROs, including care experience and quality rating [17, 18] and shared decision-making [19,20,21]. The design and conduct of the study has been reported previously [17, 18, 22]. Patients were enrolled from 2003 to 2005 using rapid case ascertainment from several geographic regions and health systems [18, 19]. Computer-adapted telephone interviewing was used to survey patients, or their proxies if patients were unable to respond or had died, approximately 3 to 6 months post diagnosis. If patients were not able to respond, they were asked if a proxy could answer on their behalf, and to nominate someone who was knowledgeable about their condition and care. Beyond being nominated by a patient, no further eligibility criteria were placed on proxies. Partial, brief or self-administered surveys were also offered if needed.

Sociodemographic information, the presence or absence of co-morbidities, and reports of care experiences, care quality, cancer symptoms, and health-related quality of life were solicited through the computer-adapted telephone interview. Trained abstractors extracted information on patient cancer stage from medical records [18]. If medical records were not available, American Joint Committee on Cancer (AJCC) stage or historical stage (local/regional/distant) was obtained from cancer registries. Questionnaire instruments were based on previously validated or employed instruments [22]. The American Association for Public Opinion Research [16] survey response rate was 51.0% and the cooperation rate was 59.9%. Institutional review boards at all participating institutions approved the study and written or verbal informed consent was obtained depending on the study site.

For this analysis, we restricted the study sample to patients and proxies of living patients who completed the full baseline telephone survey (Fig. 1, n = 6471). All patients in our analytic sample were alive at the time of the survey.

Fig. 1
figure 1

Study Sample Selection Flow Diagram

Outcome measures/dependent variables

HRQOL in CanCORS was assessed using questions from the 12-item Short Form (SF-12, version 2) [22], a validated and widely adopted generic HRQOL tool [23]. Proxy and patient questions for the SF-12 were identical, except for patients being asked about “your” health and proxies being asked about “the patient’s” health.

The SF-12 includes 12 questions that cover eight domains: general health, physical function, role-physical, role-emotional, bodily pain, mental health, vitality, and social function (Appendix: Table 4). All but one of the SF-12 items ask respondents to refer to the past 4 weeks when answering. Three and five point scales are used for item scores. Scoring through T-scores using US general population means and standard deviations and weighting produces two composite scales: the physical (PCS) and the mental (MCS). Both scales are calculated using each item, albeit with different weights; thus, missingness in any one item can result in the full scale being missing for that observation. These scales range from 0 (worst) to 100 (best), with normalized standardized T-score means of 50 and standard deviations of 10. Score differences of ½ SD (=5 points) are often treated as clinically significant [24, 25].

Independent variable and covariates: The adjustment model

In our primary analyses, our independent variable was an indicator for proxy (0/1). The use of an indicator variable for respondent status as part of a regression model is a frequently used approach for accounting for proxy-reported data [11, 26]. We also added several “standard” clinical and sociodemographic covariates, based on the casemix model used for the MHOS, a large national survey of HRQOL that allows for proxy respondents. Because MHOS is not cancer-specific, we also adjusted for cancer type (lung or colorectal) and disease curability (incurable/potentially curable). Incurable disease was defined as AJCC stage IIIB or IV, distant stage, or unstaged for lung cancer and AJCC stage IV, distant stage, or unstaged for colorectal cancer. We included this information as disease stage has been shown to be a predictor of HRQOL in patients with cancer [27, 28].

Our regression models adjusted for several patient sociodemographic characteristics, all of which have been previously shown to predict HRQOL in patients with cancer or cancer survivors: gender [27, 29], marital status [30], race/ethnicity [30], age [29], and educational attainment [31], all of which were solicited during the telephone interview and provided either by the patient, describing themselves (in a patient interview) or the proxy, describing the patient (in a proxy interview). Understanding variation in patient experience by race/ethnicity was a goal of CanCORS [18]. Race/ethnicity was collected using the U.S. Census variables. For this analysis, race/ethnicity was included for consistency with MHOS casemix [32]. We also adjusted for CanCORS study site. The separate models for the two primary outcomes (PCS and MCS) were examined. We also compared adjusted and unadjusted proxy-patient differences for each individual subscale (general health, physical function, role physical, role emotional, bodily pain, mental health, vitality, and social function). Because all subscales are combined in the PCS and MCS calculation, we evaluated the subscales to ensure that no single subscale had significant influence on our results.

Finally, we adjusted for the presence of comorbidities, as comorbidities have also been predictors of HRQOL in patients and survivors [28, 33, 34]. Our analyses adjusted for each of the following patient comorbidities: coronary artery disease (heart attack and/or bypass), heart failure, stroke, arterial bypass, lung disease (asthma, bronchitis, emphysema, or other chronic lung conditions), diabetes/high blood sugar, kidney problem, depression (or other emotional, nervous, or psychiatric problems), previous cancer, and hospitalization within the last year. In patient surveys, these comorbidities were reported by the patient [35], but in proxy surveys the proxy reported whether or not the patient had the comorbidities in question.

Although we felt that it was possible that proxy characteristics could influence their reports of patient HRQOL, most surveys that collect proxy data do not collect proxy characteristics or adjust for them in models. Thus, to be consistent with our goal of examining proxy-patient differences after adjustment for frequently used covariates, we did not include proxy characteristics in our models. However, to provide context we report the most common types of proxy-patient relationships, determined via the proxy questionnaire.

Statistical analyses

The PCS and MCS were modeled separately for all analyses. Unadjusted patient-proxy differences in mean HRQOL scores were obtained using t-tests. Multivariable linear regression models with all independent variables were used for adjusted analyses. Predicted marginal means were calculated for proxies and patients for each outcome. The assumptions of the linear regression models were evaluated using residuals vs predicted plots, Q-Q plots of residuals, and Cook’s d values. Between-respondent comparisons of covariates were conducted using chi-square tests. All significance tests were performed at the α = 0.05 level, and all analyses were conducted using SAS v9.4.

Missing data

Multiple imputation (MI) was used for missing data. Missing data in CanCORS were imputed using sequential regression multiple imputation in IVEware [36]. The coefficient of determination (R2) was estimated in each of the imputed datasets separately and combined using Harel’s formula [37, 38]. The contribution of variables to the model was evaluated using the multiple partial F-test for MI data [39]. These calculations were performed in R Studio (version 3.2.2). We defined the analytic cohort as respondents with complete covariates. Responses such as “not applicable” or “refused” were treated as incomplete and excluded, resulting in an analytic cohort of N = 6426.

For individual items in the SF-12, any non-numeric or “n/a” response is treated as missing, because it cannot be validly summed as part of the composite score. These items are excluded from the calculation, resulting in missing scales for the respondents [40]. Within the analytic cohort, excluded items were infrequent; N = 6422/6426 (99.9% of the analytic cohort and 99.2% of the study sample) had valid, numeric responses for all items and the corresponding PCS and MCS scales.

Sensitivity analyses

We conducted several sensitivity analyses. First, we evaluated the impact of including unstaged patients in our models by estimating the models with these patients excluded. Second, we analyzed the robustness of the results when variables were added or removed from the regression models. For the robustness check, we evaluated the impact of adding wealth (number of months patients could live on their savings), to approximate income; this variable has been used previously in CanCORS analyses [41, 42]. We also evaluated the impact of adding survey timing relative to diagnosis. Next, we evaluated the impact of removing the following variables: 1) all co-morbidities, including hospitalization; 2) hospitalization only.

Third, we implemented our primary models using complete case analysis for missing data rather than the multiply imputed datasets; the complete case analysis approach has been used in several studies evaluating HRQOL in MHOS respondents with cancer [43,44,45]. Fourth, we evaluated the impact of proxy reports of patient comorbidities on the relationship between comorbidities and HRQOL outcomes in the primary model. Because proxies may report patient conditions such as depression differently than patients [46, 47], we compared the coefficients for all comorbidities in the primary model under three scenarios: 1) with both patient- and proxy-reported data and an indicator variable for proxy status; 2) with both patient- and proxy-reported data and no indicator variable for proxy status; 3) with patient-reported data only. Lastly, as noted previously we also checked that our results were not disproportionately influenced by a single subscale by testing proxy-patient differences at the subscale level.

Results

Among 6471 participants, 1011 (16%) were proxies. Most proxies were the patient’s spouse/partner (50%) or child (36%). Patients with proxy reporters had similar proportions of the different cancer types but were more likely to have incurable disease (40.6% vs 30.8%), were older, and were less educated (Table 1). Patients with proxy reporters also had more comorbidities and were more likely to have been hospitalized in the preceding year (26.0% vs 19.9%). For comorbidities, the greatest difference was observed for stroke (proxy report: 17.4%, patient report: 8.3%).

Table 1 Study sample characteristics: Observed data (N = 6471)

In unadjusted analyses, proxy-reported PCS and MCS scores were clinically and statistically significantly lower. Proxy-reported PCS scores were 6.65 points lower on average (95% CI -7.42 to −5.88). On the PCS, the average proxy-reported score was 33.56 (SE = 0.36), versus an average patient-reported score of 40.21 (SE = 0.15). Proxy-patient differences for the MCS were slightly smaller but still large: average proxy scores were 5.96 points lower (95% CI -6.74 to −5.19). On the MCS, the mean scores were 44.73 (SE = 0.36) and 50.69 (SE = 0.16). For the subscales, proxy-patient differences ranged from −5.4 points (bodily pain) to −8.9 points (physical function); differences of 5 points or greater were seen in either the point estimate or confidence interval of all subscales.

Adjustments using sociodemographic and clinical covariates had minimal effect on proxy coefficient estimates, resulting in clinically and statistically significant proxy-patient conditional differences (Table 2). Similarly, for all subscales, conditional differences of at least 5 points remained after adjustment. Model diagnostics did not indicate severe violations of the multivariate linear regression assumptions.

Table 2 Proxy-patient differences for health-related quality of life outcomes

In sensitivity analyses (Table 2), excluding unstaged patients had only small effects on the average proxy scores, with changes of 0.10–0.30 points depending on the outcome. Adjusting for wealth did not substantially reduce the gap between proxy and patient scores for both PCS and MCS models, although this variable was statistically different from zero for both outcomes (F < 0.05 for both). With wealth included, proxy-patient conditional differences for both outcomes remained clinically and statistically different: proxy PCS scores were 5.72 points lower on average (95% CI -6.51 to −4.94) and proxy MCS scores were 5.77 points lower on average (95% CI -6.56 to −4.98). Similarly, adjusting for survey timing had a minimal effect on conditional proxy-patient differences. The coefficient for survey timing was statistically different from zero for PCS (F < 0.05) but not for MCS (F > 0.05). In both cases, the point estimate for the average conditional difference changed only slightly; the proxy-patient conditional difference increased by 0.13 points for PCS and 0.04 points for MCS.

Sensitivity analyses that excluded comorbidities and hospitalization (Table 3) showed that models including these covariates were significantly more effective in predicting both outcomes (F-tests of p < 0.05 for both outcomes for these analyses). Excluding co-morbidities and hospitalization exacerbated proxy-patient differences for both PCS and MCS outcomes; proxy-patient conditional differences increased by 0.53 points for PCS and 0.71 points for MCS. Including co-morbidities but not hospitalization resulted in an increased proxy-patient conditional difference of 0.04 points for PCS and 0.02 points for MCS.

Table 3 Association of comorbidities and hospitalization with HRQOL outcomes: impact of proxy data

The results from the primary analysis were similar when complete case analysis rather than analysis of multiply imputed data was used (data not shown). Finally, the associations between comorbidities and HRQOL scores were similar whether or not proxy data were included (Table 3).

Discussion

In a large national survey of cancer patients in which proxy reports were used to substitute for unavailable patient reports, proxy and patient reports of patient HRQOL had large, clinically relevant differences. Proxy-reported scores were significantly lower than patient scores, indicating worse HRQOL. Furthermore, these differences persisted even after adjustment for clinical and sociodemographic covariates, and changes to the covariates that were included in model had minimal effects. These findings were also robust to different approaches for addressing missing data.

In contrast to previous paired proxy-patient studies in cancer that found only small differences in proxy and patient-reported HRQOL, we found that differences between patient and proxy reports of patient HRQOL were relatively large. For example, Tang and McCorkle’s review of proxy-patient concordance studies in terminally-ill cancer patients found that most studies had small mean differences for physical HRQOL dimensions, with moderate differences seen for more subjective HRQOL aspects such as fatigue and emotional function [6]. Similarly, Sneeuw and colleagues’ review of proxy-patient dyad studies in a range of disease groups, including cancer, found generally small differences between proxy-patient pairs, and only saw more extreme differences in studies with small sample sizes [7]. However, in a study with a large sample size we found relatively large and clinically important differences for both mental and physical dimensions of HRQOL. This suggests that proxies in our study may represent a sicker population. Patients who were too ill to participate in a lengthy interview may have requested that a proxy complete the interview on their behalf. It is possible that the covariates that were collected in the CanCORS survey and used in our model did not adequately capture this decision. Additional information, such as the reason for non-response and the rationale for nominating a specific person as a proxy, may have been helpful and could potentially be considered in future studies. Another possibility is that this difference may be due to proxy bias. Schwarz and Wellens suggest that proxy reporters use different sources of information when making reports compared to individuals making a self-report [48]; in this vein, Snow and colleagues note that because patients know more about themselves than proxies do, proxy bias should be larger for less observable constructs [49]. In our study, however, we found that proxy-patient differences were similar for both physical and mental health. Future qualitative research that investigates the processes and decision-making involved in proxy reporting may be beneficial in addressing some of these issues.

Our findings of similar levels of proxy-patient differences for physical and mental health outcomes are not completely consistent with evaluations of proxy response bias among elderly Medicare patients. Using the Medicare Current Beneficiary Survey, Li and colleagues found higher levels of proxy-patient difference for less observable domains, such as cognitive abilities, and lower levels for more observable domains such as mobility, with no significant differences found for highly observable domains such as seeing or eating solid foods [50]. These differences were obtained from a propensity-matched analysis, accounting for sociodemographic variables such as age and gender as well as clinical information such as the Charlson Comorbidity Index. Our analysis did not use propensity score matching, but adjusted for several co-morbidities as well as for information about disease type and stage as well as socio-demographics in a regression model. One possible explanation is that these co-morbidities may be less important in predicting health status for patients with cancer; it is likely that cancer, rather than, for example, a history of heart failure, is the more proximal driver of poorer health status. Another possibility is that the choice of measurement tool may be a factor. For example, Ellis and colleagues assessed patient-proxy differences among MHOS respondents using the SF-36, and found unadjusted patient scores to be approximately 7 points higher than unadjusted proxy scores for both the PCS and MCS [10]. These differences are consistent with our findings, although relatively few of the respondents in Ellis et al.’s analysis had cancer.

Although previous studies have found discrepancies between proxy and patient reports of comorbidities such as depression [46], the proportion of patients with depression was similar for both proxy- and patient-reported data in our analyses. Furthermore, in our data, the impact of comorbidities on HRQOL was the same regardless of whether or not proxy-reported data were included. One explanation for this may be that while proxy-reported rates of comorbidities were higher, the distribution of comorbidities was roughly similar between respondent types for most of the included comorbidities. Additionally, with the exception of depression in the model for the mental health composite outcome, the average difference in HRQOL between respondents with and without a given comorbidity was relatively small and not clinically significant (<5 points). Alternately, our measure of comorbidity, which assessed the presence of comorbidity but not its severity, may have been insufficiently sensitive to capture comorbidities that were severe enough to impact HRQOL.

Conclusions regarding the impact of proxy reporting in surveys have varied, possibly due to the different outcomes for which proxy respondents have been employed. Some analyses have found the impact of including proxy reports in surveys to be minimal [51] so adjustment can be effective in minimizing proxy-introduced impact [13], but others found that further information about proxies is required for better adjustment [12, 52]. With regard to surveys of HRQOL in cancer patients, our findings indicate that proxy reports differ significantly from patient reports, and regression adjustment using sociodemographic and clinical covariates has at minimal impact on this difference. It is possible that identifying and including additional covariates predictive of patient illness may reduce these differences, particularly because the fully adjusted models only explained <20% of the variance of both outcomes. However, even models that include symptoms as predictors of HRQOL result in less than 30% of the variance in outcomes explained [53]. Similar levels of explanatory power are reported for paired studies evaluating factors associated with proxy-patient concordance [54].

This study had several limitations. First, the CanCORS data, while population-based and nationally representative, were collected several years ago. However, there is no reason that the models would not be valid when evaluating a methodological issue such as evaluating proxy-patient differences in HRQOL. Second, the data used in this paper are cross-sectional; however, many population-based surveys and most studies evaluating patient-proxy concordance in HRQOL employ cross-sectional designs [5]. Third, the CanCORS response rate was 51%; while not ideal, this rate is similar to the response rates for other large population-based national surveys such as the BRFSS [55] that are used to inform health policy and practice. Finally, although using ½ SD as a marker of clinical or minimally important difference is common, it is not the only metric for estimating a minimally important difference. Differences of two to three points on the SF-12 have been considered minimally important in studies with patients with prostate cancer [56]. In a study with patients with extramedullary spinal tumors, score differences of 2.8 points were proposed as minimally important for the PCS and differences of 10.7 for the MCS [57]. Since minimal differences may vary by population and context [58], the generalizability of these varying thresholds to our study, which included patients with lung and colorectal tumors, is not clear. We did not identify a clearly established minimal difference threshold for our survey population and context in the literature. Nonetheless, in our study we identified large and persistent differences that were only minimally affected by adjustment for sociodemographic and clinical covariates.

Conclusions

In summary, proxy reports of patient HRQOL that are used to substitute for otherwise missing patient reports are clinically and statistically different from available patient reports. Adjustments using frequently employed sociodemographic, clinical and comorbidity covariates have a minimal effect on these differences. In situations of high rates of proxy use, the effect of proxies can be consequential, particularly if the results of such estimates will be employed in performance measures or used to inform policy decisions.