Background

It is common, if not usual practice, to include health-related quality of life (HRQL) measures in clinical trials in oncology. To justify the cost of new cancer drugs, decision-makers need to determine not only whether a drug has a statistically significant impact on survival and/or HRQL, but they also need to evaluate whether the improvement is meaningful. This is particularly important in lung cancer, where aggressive new therapies are being brought to market. In addition to the use of cancer-specific measures such as European Organization for Research and Treatment of Cancer-QLQ-C30 (EORTC QLQ-C30) [1, 2]and the Functional Assessment of Chronic Illness Therapy (FACIT) measurement system [3], clinical trials in oncology are increasingly incorporating generic preference-based measures such as EQ-5D. EQ-5D is an indirect measure of utility for health that generates an index-based summary score based upon societal preference weights [4]. Utility scores enable comparisons of burden of disease across conditions and the calculation of quality-adjusted life-years (QALYs), an outcome used to compare the cost effectiveness of health care technologies.

A major challenge in HRQL measurement is the interpretation of scores, particularly with respect to defining what constitutes a minimally important difference (MID). The MID has been defined as the smallest change in a PRO measure that is perceived by patients as beneficial or that would result in a change in treatment [5]. Approaches to estimation of MIDs have been classified as either distribution-based or anchor-based [6]. Anchor-based approaches compare changes seen in an individual's HRQL to an external criterion, such as a clinical measure or using a patient rated global change question. Problematically, no single anchor represents a gold standard and no approach is ideal. Norman et al (1997) found that retrospective global ratings of change have questionable ability to yield information of treatment effects [7]. Alternatively, distribution-based approaches rely on the distribution of scores and are computed using variations on effect size [8]. The main disadvantage to distribution-based techniques is that they do not provide insight into the importance of the difference [9]. Often both approaches are combined, with anchored-based HRQL changes initially framed in terms of the individual are then further analyzed as a group using distribution-based methods [1015].

While MIDs have been estimated for EQ-5D index-based scores for some conditions [16], empiric work has not been performed in cancer. Additionally, it is not clear if lung cancer has a different range of MID estimates. Thus, the aim of this study was to provide a range of estimates for meaningful difference in EQ-5D scores in cancer and to determine if MIDs for lung cancer are different from all cancers.

Methods

Study design

A retrospective analysis was conducted on cross-sectional data collected from 534 cancer patients with eleven types of cancer who participated in a validation study of cancer symptoms scales [17]. Participants had advanced (stage 3 or 4) cancer of the bladder, brain, breast (females patients only), colon/rectum, head/neck, liver/pancreas, kidney, lung, lymphoma, ovary (females patients only), and prostate (males patients only). All patients had received at least 2 cycles of chemotherapy, or if chemotherapy was non-cyclical, had been receiving it for at least 1 month. Efforts were made to recruit 50 patients for each type of cancer, with approximately equal proportions of male and female patients for the non-gender specific types of neoplasm. This dataset included 50 patients lung cancer patients, and between 50 and 52 patients with all other types of cancer except bladder cancer (n = 31).

The patients were recruited from six sites within the National Cancer Coalition Network (NCCN) and the Cancer Health Alliance of Metropolitan Chicago (CHAMC). The NCCN is a not for profit, tax-exempt corporation that is an alliance of National Cancer Institute (NCI) approved comprehensive cancer centers. The CHAMC organizations provide social, emotional and informational support services to cancer patients free of charge. These organizations are not affiliated with a medical center or university, and each CHAMC agency serves different geographical and socio-demographic cancer patient populations. All patients who completed the questionnaires consented to participate in the study. Institutional review board approval was obtained for secondary data analysis (University of Illinois at Chicago research protocol #2006-0891).

Measures

Patients completed several questionnaires, including the EQ-5D and the Functional Assessment of Cancer Therapy (FACT). The EQ-5D descriptive system consists of 5 dimensions: Mobility, Self-Care, Usual Activities, Pain/Discomfort, and Anxiety/Depression, each with 3 levels (e.g. no problems, moderate problems, extreme problems) [18]. Index-based summary scores were calculated based on 2 different algorithms using societal preference developed from general population-based valuation studies in the United Kingdom [19] and the USA [20]. The index-based score is typically interpreted along a continuum where 1 represents best possible health and 0 represents dead, with some health states being worse than dead (<0). In addition to the self-classifier, respondents rate their health today using a 20 centimeter visual analogue scale (VAS) that ranges from 0 (worst imaginable health state) to 100 (best imaginable health state).

Participants also completed the Functional Assessment of Cancer Therapy (FACT) quality of life questionnaire using a version specific to their tumor type. The general subscales common to all versions (FACT-G) include physical well-being (PWB), social/family well-being (SFWB), emotional well-being (EWB), and functional well-being (FWB). The FACT-G total score (FACT-G Total) is based on 26 summed items (responses 0 to 4) from the PWB (7 items), FWB (7 items), SFWB (6 items), and EWB (6 items). Higher scores represent better quality of life.

Performance status was evaluated using the Eastern Cancer Oncology Group (ECOG) classification system [21]. ECOG grades range from 0, which is fully active, to 4, completely disabled, and 5 is dead. ECOG grades are used by physicians and researchers to assess progression of disease, impact of the disease on daily activities, and to guide appropriate treatment and prognosis.

Analysis

Both anchor-based and distribution-based approaches were used to estimate MIDs for the EQ-5D in the overall cancer cohort, and in the subgroup of lung cancer patients, when possible. Distribution-based criteria included: 1/2 standard deviation (SD) and the standard error of the measure (SEM) [22]. For consistency with past studies exploring MIDs, 1/3 SD was also reported, but it was not included in the summarized range of MIDs as there is less evidence to support that 1/3 SD represents an important difference. The SEM is calculated as σ x 1 r x MathType@MTEF@5@5@+=feaagaart1ev2aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGacaGaaiaabeqaaeqabiWaaaGcbaGaeq4Wdm3aaSbaaSqaaiabbIha4bqabaGccqGHxiIkdaGcaaqaaiabigdaXiabgkHiTiabdkhaYnaaBaaaleaacqqG4baEaeqaaaqabaaaaa@352F@ where r is reliability of the measure. It is debatable which type of reliability, internal consistency or test-retest (TRT) reliability, is most appropriate. Very limited evidence of TRT reliability is available on the EQ-5D in cancer [4]. Because the EQ-5D has single item dimensions, internal consistency reliability does not apply to each dimension. Although HRQL is considered a multi-dimensional construct, the aggregation of dimensional responses to create a single summary score is an implicit endorsement of HRQL as an overarching construct. However, item response theory-based analysis of the dimensional structure of the EQ-5D has indicated that the anxiety/depression dimension taps into a construct distinct from the other 4 items [23]. Calculation of internal consistency reliability using Cronbach's alpha was 0.68, regardless of whether or not anxiety/depression was included. Thus, for the purposes of our analysis, a reliability of 0.68 was used in the calculation of the SEM.

Anchors can be constructed using clinically-based criteria, such as response to treatment, or more subjective criteria, e.g. health status. We used ECOG grades, assessed by physician, to group patients into categories of performance status, and determined mean difference scores between ECOG grades. Distribution-based criteria were then applied to the statistics associated with each anchor-based category. A second anchor-based approach used FACT-G scores. The cohort was stratified into quintiles based on FACT-G summary scores. Grouping the cohort into quintiles approximated an appropriate threshold for stratifying patients based on MID estimates for the FACT-G, have been identified as close to 6 in previous studies: 6–7 in hepatobiliary carcinoma [13], and 5–6 in breast cancer [10]. Final results were summarized as a range of MID estimates and as an average MID across categories, weighted by the sample size within each category.

Results

Similar demographic characteristics were observed in the overall cancer sample and the lung cancer subgroup (Table 1). A wide range of scores were observed in the overall cancer cohort, with UK-based scores ranging from worse than dead (-0.14) to full health (1.0). A smaller range was observed for the US-based scores (0.21 to 1.0). Compared to the mean (SD) scores for the overall cohort [UK 0.72 (SD 0.22); US 0.78 (SD 0.15)], the subgroup with lung cancer had lower mean utility scores but similar dispersion around the mean [UK 0.67 (SD 0.22); US 0.74 (SD 0.16)] (Table 2). Mean VAS scores for the lung cancer subgroup [68 (SD 18)] were the same as for the overall cancer cohort [68 (SD 20)].

Table 1 Patients characteristics, all cancers and lung cancer subgroup
Table 2 Patients EQ-5D and FACT-G summary scores, all cancers and lung cancer subgroup

For all cancer patients, mean difference scores anchored by ECOG status ranged from 0.09 to 0.16 for UK scores and from 0.07 to 0.11 for US scores (Table 3). Across ECOG-based strata, MIDs based on the SEM and 0.5 SD were similar, ranging from 0.08 to 0.16 for UK scores, and from 0.06 to 0.10 for US scores. For the lung cancer cohort (excluding the single patient with grade 3 PS), mean difference scores between ECOG levels ranged from 0.10 to 0.13 (UK scores), and from 0.07 to 0.09 (US scores). MIDs based on SEM and 0.5 SD ranged from 0.08 to 0.14 (UK scores), and from 0.07 to 0.12 (US scores).

Table 3 EQ-5D index-based utility scores by ECOG grade, overall and lung cancer

Average mean estimates of MIDs across FACT-G based quintiles for the overall cancer cohort were 0.09 for UK scores, 0.06 for US scores (Table 4). Using distribution based criteria averaged across quintile-based groups, MIDs for the overall cohort were: SEMUK = 0.10, 1/2 SDUK = 0.09; SEMUS = 0.07, 1/2 SDUS = 0.06. For the lung cancer subgroup, average MIDs between quintiles were 0.10 (UK) and 0.07 (US), with SEMUK = 0.09, 1/2 SDUK = 0.08; SEMUS = 0.06, 1/2 SDUS = 0.06.

Table 4 MID estimates for EQ-5D Index-based scores by FACT-G quintile subgroups

MID estimates for EQ-5D VAS scores based on FACT-G score quintiles were the same for both the overall cancer groups and the lung cancer subgroup (Table 5). MIDs for VAS scores ranged from 7 to 10 when MIDs were averaged across the anchor-based categories using FACT-G quintiles. Average mean difference was 7 between quintile categories; 10 according to the SEM; and 9 using 1/2 SD. MIDs for VAS scores tended to be slightly larger using ECOG grade to anchor difference scores compared to FACT-G score based quintiles, ranging from 8 to 11 (all cancers) and 7.5 to 11.5 (lung cancer).

Table 5 MID estimates for EQ-5D VAS scores by ECOG grade and FACT-G quintile

Discussion

Interpretation of scores is an important issue in the field of HRQL measurement, but there is no consensus on the most appropriate method for assessing the ability of an instrument to capture meaningful differences. In this study, we followed criteria established in previous investigations of MIDs [1315]. We found that distribution and anchor-based estimates tended to converge, helping to triangulate support for the validity of the range of MID estimates. In addition, the MIDs for overall cancer and lung cancer cohorts were similar.

The issue of what constitutes an MID on a measure of HRQL is part of an ongoing dialogue about issues of interpretation. Developers of HRQL measures have not been forthcoming in the literature in explicitly attempting to establish MIDs. One reason to avoid this is because clinically important differences may vary with the target population. Limitations in the scaling properties of a measure can contribute to inconsistent MID estimates, as they may depend upon where a patient or group falls along the continuum of the measure. Distribution-based approaches for estimating important differences rely on the assumption of normality, and ceiling effects particularly in healthier patient populations produce skewed score distributions. Although ceiling effects have been associated with the use of EQ-5D [24], a ceiling effect was generally not observed in the cancer cohort, and standard deviations were relatively stable across the anchor-based strata.

MID estimates for EQ-5D in this study can be compared to other studies that have examined important differences using EQ-5D. A previous study by Walters and Brazier compared minimally important differences between SF-6D and EQ-5D, and reported a mean MID of 7.4 for the UK-based algorithm [16]. Their estimate was at lower range of MIDs estimated in this study for cancer patients, which may imply that MIDs in cancer are slightly larger than for the conditions investigated, which included leg ulcer, back pain, early rheumatoid arthritis, limb reconstruction, osteoarthritis, irritable bowel syndrome, and chronic obstructive lung disease. An alternative explanation is that the anchors used in this study, particularly ECOG grade, provided benchmarks for meaningful differences that do not necessarily represent a minimally important difference.

MIDs are often estimated using longitudinal datasets, and difference scores based on changes over time were not available in this dataset, which was cross-sectional. However, the MIDs for EQ-5D UK-based utility scores reported using longitudinal data [16] were comparable to the estimates for UK scores generated in this study. Another limitation of our study was that sample size for lung cancer subgroups was small. When further stratified by ECOG grade, sub-sample sizes became extremely small and produced unreliable estimates in the lung cancer subgroup, although the average MID obtained in lung cancer tended to be similar to the overall cancer cohort. It is unclear if MIDs based on patients with advanced cancer in this study generalize to patients with less advanced stages of cancer.

An additional issue for users of EQ-5D is the selection of preference-based algorithm. As observed in this study, MIDs varied with the selection of the algorithm. MIDs for EQ-5D UK index-based utility scores ranged from 0.08 to 0.16 with a mode of 0.10. For US-based scores, a range of 0.06 to 0.12 was reported, with a mode of 0.07. This result was not unexpected, as the US preference-based algorithm produces scores with a smaller range than the UK scores, resulting in smaller difference scores and smaller standard deviations, thus smaller MIDs.

Conclusion

In summary, important differences in EQ-5D summary scores were similar for all cancers and lung cancer, with the lower bounds likely to represent a closer estimate of true MID, i.e. 0.08 for UK-based scores, 0.06 for US-based scores, and 0.07 for VAS scores. MIDs for EQ-5D UK-based utility scores in cancer were similar to estimated MIDs for other conditions in the published literature. To our knowledge, MIDs for EQ-5D VAS scores and US-based utility scores have not been previously reported. Across the different approaches, MIDs for US-based utility scores were consistently smaller than MIDs for UK-based utility scores.