Background

In addition to assessing of clinical efficacy, appraisals of new healthcare technology need to assess cost-effectiveness. Cost-utility analysis is frequently used for economic evaluation, with outcomes evaluated in terms of quality-adjusted life years, a measure that combines both the length and quality of life. Utilities are preference-based and derived from each individual, either directly using valuation techniques such as standard gamble, time trade-off, or the use of a rating scale, or indirectly using generic health-related quality of life (HRQoL) measures, such as the Health Utility Index [1, 2], the EuroQol 5D (EQ-5D), [3] or the Short Form 6D [4]. Scoring algorithms have been developed for all of these measures, which provide community-based health utility estimates [5].

HRQoL is often used as a secondary endpoint in cancer trials. Studies measuring patient quality of life often prefer disease-specific instruments over generic instruments. The former focus on particular health problems and tend to be more sensitive to clinically important differences [6]. They do not, however, include utility scoring systems. Therefore, the development of a tool that can map disease-specific measures onto preference-based measures may also generate weighted utilities.

The European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 (EORTC QLQ-C30) is the instrument most frequently used to measure the quality of life of cancer patients [7]. The Korean version of the EORTC QLQ-C30 has been validated for use in Korean cancer patients [8]. Although the EORTC-8D, a preference-based measure derived from the EORTC QLQ-C30, was recently introduced [9], the results obtained from these questionnaires cannot be compared with the results of questionnaires based on other disease areas because the EORTC QLQ-C30 is a cancer-specific instrument. Moreover, to our knowledge, no valuation set for the EORTC-8D has yet been developed in Korea. The EQ-5D, an instrument widely used to measure and evaluate general health status, can also be used to assign preference values to these health states. Population tariffs using the EQ-5D have been developed in several countries, including Korea [10, 11]. Although the EORTC QLQ-C30 has been mapped onto EQ-5D utilities [5, 1214], those studies were limited to patients with a single type of cancer.

The purpose of this study was to develop a mapping relationship between the EORTC QLQ-C30 and EQ-5D-based utility values at the individual level for patients with a wide range of cancers.

Methods

Data set and instruments

We used two data sets to formulate the mapping algorithm for the EORTC QLQ-C30 and the EQ-5D. The derivation set comprised 893 patients with different types of cancer [15], whereas the external validation set comprised 123 patients with colon cancer [16]. The patients in these two studies were independent of each other, but were recruited at the same cancer center.

The EQ-5D comprises five dimensions that measure general health status: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression, with each dimension having three levels. Thus, the EQ-5D provides a simple descriptive profile and a single utility index of health status, which can be used in the clinical and economic evaluation of healthcare, as well as for population health surveys [10]. The EQ-5D index for use in Korea was calculated using an algorithm [11], with possible scores ranging from −0.171 to 1.0, with 1.0 indicating full health (11111 state) and 0.0 denoting death.

The EORTC QLQ-C30 is an integrated system that assesses the HRQoL of cancer patients. It includes five functional scales (physical, role, emotional, cognitive, and social), three symptom scales (fatigue, nausea or vomiting, and pain), global health status, and six single items (dyspnea, insomnia, appetite loss, constipation, diarrhea, and financial difficulties) [7]. All of these scales and items were linearly transformed from 0 to 100 according to the EORTC QLQ-C30 scoring rules [17]. High scores on the functional scales indicate a high level of functioning and high scores on the global health status indicate a high quality of life; by contrast, high scores on the symptom scales/items indicate high levels of health problems [17].

Analysis

Ordinary least squares (OLS) regression was used to estimate the EQ-5D index from the EORTC QLQ-C30. The dependent variable was the EQ-5D index, and the explanatory variables were the EORTC QLQ-C30 scale and item scores. All variables were treated as continuous variables. The full model, which included the scores for all scales and items in the EORTC QLQ-C30, was explored and another model was developed using backward elimination with a significance level of 0.1 from the full model.

The relationships between the observed and predicted values were assessed visually. The performance of each model was evaluated by determining its predictability, goodness of fit, and the signs of the estimated coefficients. The purpose of a mapping function is to predict health utility values in other data sets; therefore, the model was assessed according to the accuracy of its predictions [18]. Predictive ability was determined by calculating the mean absolute error (MAE), the estimated proportions with absolute errors > 0.05 and >0.1, and the root-mean-squared error (RMSE). The MAE is the average of the absolute differences between the observed and predicted values, and the RMSE is the root of the average of the squared differences. RMSE can also be reported as a percentage of the scale size (i.e., 1.171, the range of the EQ-5D-based utility according to the Korean algorithm [11]), referred to as the normalized RMSE [19]. Smaller MAE and RMSE values indicate better model performance. The important aspect of the mapping was the estimated group mean and its variance, rather than individual estimated utilities. To determine whether errors were affected by disease severity, both the highest and the lowest EQ-5D index quartile groups of the derivation and validation sets were evaluated separately. The overall equality of the coefficients of the good health group and other groups was tested using the likelihood ratio test. In addition, the adjusted R2 values and the signs of the estimated coefficients were calculated. The sign of the functional scales was expected to be positive, while that of the symptom scales/items was expected to be negative.

Statistical analyses were performed using SAS software (ver. 9.1; SAS Institute Inc., Cary, NC). P<0.05 was considered statistically significant.

Results

The derivation set included patients with 28 different types of cancer (Table 1). Breast cancer was the most common (32.9%), followed by colorectal cancer (20.0%). Table 2 presents the descriptive statistics for the EQ-5D index and the EORTC QLQ-C30 scales of the derivation and validation sets. Patients in the derivation set generally had poorer scores on all scales (except for diarrhea) than patients in the validation set. Differences between scale and item scores were statistically significant, except for “constipation” and “financial difficulty”.

Table 1 Distribution of cancer patients in the derivation set
Table 2 Descriptive statistics of the EQ-5D index and the EORTC QLQ-C30 scales used in the derivation and validation sets

The results of the OLS regression analysis of each of the two models are shown in Table 3, and model performance is shown in Table 4. In Model 2 (i.e., the backward elimination model), the five scales were statistically significant; the emotional functioning scale, which had a p value of 0.071 in Model 1, became statistically significant in Model 2, with a p value of 0.01. Physical functioning was the most influential scale in both models (Table 3). The explanatory power of Model 2 was 51.6%. The MAE values of both models were the same: 0.095 for the derivation set and 0.066 for the validation set. In Model 2, the normalized RMSE was 8.1% for the derivation set and 7.2% for the validation set. The proportion of AEs > 0.1 in Model 2 was 23.1% for the derivation set and 24.4% for the validation set. The actual mean value of the EQ-5D index was similar to the predicted EQ-5D indices of both models (Table 4). Figure 1 shows a plot of the predicted value based on Model 2 versus the observed EQ-5D index in both the derivation and validation sets. In both sets, EQ-5D index for values below 0.7 tended to be overestimated, whereas the maximum value of EQ-5D was underestimated.

Table 3 Ordinary least squares regression model
Table 4 Comparison of the performance of Models 1 and 2
Figure 1
figure 1

Scatter plot of predicted values based on Model 2 parameters versus the actual EQ-5D index. A perfect fit is indicated by the 45° reference line.

Table 5 shows the model performance for both the derivation and validation sets according to health status when Model 2 was fitted. The MAEs of the lowest quartile group on the EQ-5D for the derivation (≤0.723) and validation (≤0.817) sets were 0.100 and 0.060 respectively, whereas the MAEs of the highest quartile group on the EQ-5D for the derivation (≥0.907) and validation (≥1) sets were 0.067 and 0.060, respectively. The regression coefficients of the lowest and highest quartile groups were not equal overall (p=0.021). In both data sets, the mean predicted value was overestimated in the lowest quartile group, but underestimated in the highest quartile group.

Table 5 Performance in Model 2 according to EQ-5D quartile in the derivation and validation sets

Discussion

This study explored an algorithm for mapping the EORTC QLQ-C30 onto the EQ-5D index. Model 2, which included global health status, physical functioning, role functioning, emotional functioning, and pain as explanatory variables, was preferred over the full model, due to its predictability, logical consistency, and parsimony. Although mapping the EORTC QLQ-C30 onto the EQ-5D index has been assessed previously, those studies evaluated patients with specific types of cancer, including gastric [5], esophageal [13], and breast [12, 14] cancers and multiple myeloma [19]. By contrast, the present study evaluated patients with 28 different types of cancer, providing our mapping model with the advantage (over earlier mapping algorithms) of being applicable to all cancer patients in Korea.

We also explored models using only functional scales as explanatory variables and a backward elimination model of the functional scale (data not shown). We found that Model 2 showed optimum performance, retaining the global health, physical, role, emotional functioning, and pain scales. The MAE of this model, 0.066, was lower than the MAE of 0.092 reported in another Korean study [14]. The adjusted R2 of our derivation set was 0.516 and the normalized RMSE was 8.1%. We also analyzed our data based on the UK tariff [20] using backward elimination regression. Although the remaining variables were the same as those using the Korean tariff, the magnitude of the absolute coefficients increased. The MAE of the UK backward elimination model for our derivation set was 0.156, and the adjusted R2 was 0.463. A systematic review reported that R2 statistics for condition-specific measures relative to generic measures generally ranged from 0.4–0.6 [18]. Another study showed that the backward regression model resulted in better predictability than the full model, with the former showing an adjusted R2 of 0.8 and a normalized RMSE of 6.02% for the derivation set; that study, however, included squared terms, such as the square of the physical functioning scale [12]. Use of OLS with a stepwise regression model retaining three scales (global health, physical, and emotional functioning) yielded an adjusted R2 of 0.611 and a normalized RMSE of 12.0% for the derivation set [5]. Application of OLS regression using all of the scale scores in patients with esophageal cancer resulted in variables slightly different from those previously reported, including global health, role, emotional, cognitive function, pain, and fatigue, with an adjusted R2 of 0.611 for the derivation set [13].

OLS regression tends to overestimate the true value of EQ-5D utilities for patients in poor health, while underestimating the true EQ-5D utilities at the upper end of the scale [14, 21, 22]. Our model showed the same trend, overpredicting the mean EQ-5D index in the group of patients in relatively poor health. The MAE of the best performing model increased from 0.056 in the relatively healthy group to 0.078 in the group with relatively poor health. Caution is therefore warranted when applying a mapping function to patients in poor health, and further research is needed regarding the cut-off points for the use of the EORTC QLQ-C30 on patients in poor health. Mapping is the second best alternative to the direct use of a preference-based measure because mapped estimates can yield large errors, particularly when mapping from condition-specific to generic preference-based measures [18]. This, however, may not be as important for QLQ C-30 mappings.

The mapping algorithm formulated in this study may have limited generalizability, because the participants in the validation set came from only one hospital. Although our sample included individuals with various conditions, further research with samples from other institutions would be helpful.

Conclusions

The mapping model using OLS regression showed a reasonable predictive ability. This mapping algorithm may enable researchers to convert results from the EORTC QLQ-C30 to the EQ-5D utility indexes for Korean cancer patients. Nevertheless, using OLS regression to predict very low and high EQ-5D indices remains challenging. These findings may help when assessing the performance of cost-utility analyses of the use of healthcare interventions in cancer patients.