The Agency for Healthcare Research and Quality estimated that in 2014 the cost of oncology health care in the United States (US) was $87 billion [1]. Various factors have contributed to an increase in total medical expenditures (MEs) [2] bringing about a pressing need to conduct comparative evaluations of medical interventions. Such value-based appraisals of therapies are rapidly becoming an integral part of the body of evidence informing reimbursement decisions. While the cost of therapies can be objectively measured, assessments of many health outcomes (such as symptom severity, functional impacts, and quality of life) are based on subjective evaluations that reflect patient’s experiences. The patient-reported outcomes (PROs) used to collect these data are often not well understood by the various stakeholders involved in medical decision-making [3, 4]. Hence, there is a clear need to translate scores obtained from PRO measures to other metrics, such as MEs or number of hospitalizations, which are better understood by stakeholders. Additionally, linking PRO scores to real-world outcomes has the potential to further their use in clinical settings [5,6,7], creating opportunities for collaborative decision-making between patients and health care providers [8, 9].

Overall, the impact of specific score differences in generic health-related quality of life (HRQoL) on oncology health care resource utilization (HCRU) outcomes is not well studied nor quantified. Previous research has studied the link between PRO scores and HCRU in patients with cancer, showing that measures of symptom severity and health status are associated with outcomes related to HCRU, such as emergency room visits [10,11,12,13], hospitalizations [11, 13, 14], outpatient visits [11,12,13], post-operative complications [11, 15], MEs [12, 16], and use of medications [13]. Only one study [11] has calculated the impact of specific score differences on the use of health care. In that study it was estimated that each 5-point increase on the functional well-being score of the Functional Assessment of Cancer Therapy—General Population (FACT-GP) was associated with a 27% decrease in the odds of post-operative morbidity, which included emergency room and outpatient visits.

The objective of this study was to estimate a quantitative link between PRO scores and HCRU over a period of approximately 6 months following assessment of HRQoL among patients with cancer. The PRO scores included the physical and mental component summary scores of the SF-12v2® health survey (SF-12v2), a widely used generic measure of HRQoL, and the SF-6D, a utility score that can be derived from the SF-12v2 [17]. Data stemmed from the Medical Expenditure Panel Survey (MEPS), a large-scale observational study conducted in the US.


Data source

The MEPS is a panel-based, nationally representative survey study of approximately 15,000 households, their medical providers, and employers across the US [18]. As part of MEPS, each panel is interviewed five times at intervals of approximately 6 months, spanning two calendar years. Household interviews consist of a series of questions on specific topics (i.e., demographic characteristics, health conditions, satisfaction with care). After completion of the household interview, MEPS also contacts a sample of medical providers to obtain supplemental information, such as dates of visits, use of medical care services, charges, and sources of payments. Expenditure estimates in MEPS are based on self-reported medical care events and information from medical providers obtained through a follow-back survey. MEPS data are directly available from the following website:

Study sample

The analyses used MEPS data from adult (18 years old or older) participants who completed the SF-12v2 and reported being diagnosed with cancer or a malignancy that was not in remission. MEPS participants were identified as an active oncology patient if they answered “yes” to the question “Have you ever been diagnosed as having cancer or a malignancy of any kind?” and “No” to the question: “Is <cancer or malignancy> in remission, that is, is the <cancer or malignancy> under control?” Data from these patients were extracted from the MEPS consolidated yearly files of 2008 (panels 12 and 13), 2010 (panels 14 and 15), and 2012 (panels 16 and 17), and the medical event files for the same MEPS panels. Each yearly file provides a single SF-12v2 assessment by subjects from two different panels, which was linked to the expenditures and medical events reported in the medical event files for the reference period subsequent to the SF-12v2 assessment.

Cancer is included in the priority conditions section within the MEPS interview. Priority conditions were selected because of their relatively high prevalence and well-developed standards for appropriate clinical care. Condition data were collected at the person-by-round level (indicating if the person was ever diagnosed with the condition) and at the condition level (as expenditures and events associated with the condition). The type of cancer or malignancy was noted verbatim by the MEPS interviewer and recoded into standard cancer malignancy types indicating the type of organ affected (e.g., breast, brain, colon). Based on the cancer malignancy codes included in MEPS, a cancer type variable was created for cancers reported by 30 or more patients. Cancer diagnoses with a lower response frequency count were coded in MEPS as “other.”


Participant information was included as covariates to control for the differential effects of sociodemographic variables. They include age, gender, marital status (not married, married), education (less than a high school degree, high school degree, some college or associate degree, college degree, graduate education), employment status (not employed during round, employed during round), insurance status (insured any time, not insured), and number of medical conditions. As part of MEPS, medical conditions were identified by spontaneous self-report, after being probed for particular diagnoses during the priority conditions section, or any medical event or disability days linked to a specific condition.

Health-related quality of life

The SF-12v2 is a 12-item survey for measuring functional health and well-being from patients’ point of view. It consists of eight domains: physical functioning, role limitations due to physical problems, bodily pain, general health, vitality, social functioning, role of limitations due to emotional problems, and mental health. The two component summary scores (PCS and MCS) are derived from the eight domain scores and are T scores with a mean of 50 and standard deviation of 10 in the US general population [19]. Higher PCS and MCS scores represent better health. The SF-6D is a health-utility measure that can be scored from the SF-12v2 [17] on a metric anchored by 1 (perfect health) and 0 (death). The current scoring used utility weights from Brazier et al. [17]. The SF-12v2 survey has been used and validated to assess health status of patients with cancer [20,21,22,23,24,25]. The SF-12v2 was captured in rounds two and four of MEPS, which occur approximately 1 year apart (Fig. 1). As previously described, each consolidated yearly file includes data from two panels. Thus, for the earlier panel of each yearly file the SF-12v2 assessment occurred in round four, while for the latter panel it occurred in round two (e.g., for 2008, the SF-12v2 was captured in round two for panel 12 and in round two for panel 13).

Fig. 1
figure 1

Data collection and reference periods of key outcome variables used in the current study

Health care resource utilization (HCRU)

In MEPS, data for medical events that occurred within a calendar year are provided in seven separate data files, according to the type of care: prescribed medicines, hospital inpatient stays, emergency room visits, outpatient visits, office-based provider visits, home health, and other medical expenses. The round in which the event occurred is indicated for each event, along with the total expenditures associated with the event. In the current study, these data were extracted from the medical event files associated with panels 12 through 17, as described above, and totals calculated for the reference period between the round of the SF-12v2 assessment and the subsequent round (e.g., when the SF-12v2 was captured in round four, the HCRU reference period is the time that occurred between interview rounds four and five, please see Fig. 1). For each person, total MEs were calculated as the sum of out-of-pocket or insurance payments pertaining to all seven sources described above, while utilization frequency was calculated as the sum of inpatient, emergency room, outpatient, and office-based provider events.

Statistical analyses

For each of the two HCRU outcomes, MEs and utilization frequency, models were developed separately for PCS, MCS, and SF-6D. In addition to the SF-12v2 score, all initial models (full base models) used age, gender, marital status, education, employment status, insurance status, and number of comorbid conditions as covariates. For each of the models, diagnostics tests were conducted to determine the best form of each of the four generalized linear models (GLMs) used to estimate the effect of SF-12v2 score differences on MEs and number of medical events. The GLM is a generalization of linear regression that allows for response variables that have non-normal error distributions and has the general form \(g(E[Y])=X\beta\). Under this approach, the dependent variable, Y, is assumed to be generated from a particular distribution that approximates its mean and variance relationship. The function, g, links the mean of Y, E[Y], to the linear predictor, \(X\beta\) (with \(X\) the vector of independent variables and \(\beta\) the vector of regression coefficients). The modified Park test [26] was used to identify the distribution family that most closely described the mean–variance relationship of the data. The test identifies the best fitting distribution by comparing the coefficient from a residuals-based regression to the following values: 0 indicates Normal; 1 indicates Poisson; 2 indicates Gamma; 3 indicates Inverse Gaussian. Three additional tests were conducted that have been recommended [27] to identify the best link for the GLM: (1) Pearson correlation, (2) Pregibon link, (3) modified Hosmer and Lemeshow.

For MEs, the modified Park test indicated the Gamma to be the best distribution family (PCS coefficient = 1.94; MCS coefficient = 2.07; SF-6D coefficient = 1.94) with the log providing an appropriate link to both GLMs linking each of the SF-12v2 scores to MEs. For utilization, frequency diagnostic tests indicated that neither the Poisson nor the Normal distribution could be ruled out as adequate distributions for the GLM (PCS coefficient = 0.29; MCS coefficient = 0.34; SF-6D coefficient = 0.46) and that the log link was appropriate for the model. Additional tests for overdispersion indicated that the scale parameter in the Poisson model was significantly different from zero (χ2 = 1978.4, P < 0.001) and that this generalization of the Poisson model further improved model fit. Therefore, the usual Poisson model was adjusted to allow for overdispersion (with the Pearson’s chi-squared as the adjustment to standard errors).

After confirming the best distribution and link function, model refinement was carried out using at least one of the following procedures: (1) excluding, in turn, coefficients that showed no significant association with the HCRU outcome (age and gender were included in a final model regardless of statistical significance); (2) collapsing factor variables of redundant categories (i.e., those categories represented by model parameters with an associated P > 0.05); (3) adding model parameters representing the interaction between SF-12v2 PCS (or MCS or SF-6D) with age and gender to determine whether the effect varied according to either of these variables. After a parsimonious model was obtained (final model), a variable representing cancer types (with 30 or more patients) and its interaction with the HRQoL score was added to the model to evaluate possible differences in the effect of HRQoL on the outcome by type of malignancy. All analyses were conducted with Stata software (Version 13).


Sociodemographic and clinical characteristics

The MEPS sample of patients with active cancer (n = 647) was predominantly female (57%) and the mean age was 62 years (range 18–85 years; Table 1). Forty-five percent had at least a high school degree and about 20% had a college degree or higher. Slightly over half of the sample was married (51%) and about two-thirds (67%) were not working during the time of the MEPS interview. The SF-12v2 was completed by 87% of the sample of patients with active cancer. Overall, participants reported more physical (mean = 38.2, standard deviation (SD) = 13.7) than mental health impairment (mean = 46.9, SD = 11.8). The SF-6D score had a mean of 0.665 and an SD of 0.166. The four most frequently occurring cancer types in the MEPS sample were prostate, breast, skin (combined non-melanoma, unknown type, and melanoma), and lung. Nearly all participants (95%) had one single active malignancy. A relatively small percentage of the sample had 2 (4.6%) or 3 (0.6%) malignancies. Patients with lung cancer reported the worst impairment in scores (PCS: mean = 29.8, SD = 11.7; MCS: mean = 42.5, SD = 11.3, SF-6D: mean = 0.582, SD = 0.146), among all patients with cancer. They also had, on average, more comorbid conditions (mean = 6, SD = 5.7). Patients with prostate cancer reported the highest functioning (PCS: mean = 42.5, SD = 12.5; MCS: mean = 51.2, SD = 9.8, SF-6D: mean = 0.742, SD = 0.148) and fewer comorbid conditions (mean = 4.6. SD = 3.9) than other cancer groups.

Table 1 Sociodemographic characteristics of analysis sample

Medical expenditures and utilization frequency

The utilization frequency and 6-month MEs are summarized in Table 2 by cancer type. Patients with lung cancer had the highest total mean MEs ($15,974; median = $19,421; maximum = $69,289), followed by patients with “other,” breast, prostate, and skin cancer ($3943; median = $1323; maximum = $71,532). Patients with lung cancer also had the highest mean expenditures for emergency visits, outpatient visits, office-based visits, home health, prescriptions, and other medical costs. Cost of hospital inpatient care was the greatest contributor to total expenditures with mean values that ranged between $11,411 (breast cancer) and $20,189 (“other”). The mean utilization frequency was also highest among patients with lung cancer, with 12 events, and twice as high as the average number of events among patients with prostate and skin cancer. The maximum number of medical events was 68, for the entire sample, and lowest among patients with breast cancer [31].

Table 2 Mean (SD) medical expenditure and utilization frequency in the 6-month period after HRQL assessment, by cancer type

Association between SF-12v2 scores and medical expenditures

Table 3 shows parameters for final model linking PCS to MEs, which in addition to PCS, included age, gender, education, marital status, cancer type, and an interaction term between cancer type and PCS. To obtain a more parsimonious model, parameters for the lung and breast cancer groups were constrained to zero given that neither of these groups significantly differed from the “other” cancers group in terms of MEs. Similarly, the interaction term indicated that the association between PCS and MEs was significantly different for patients with skin and prostate cancer when compared to patients with the “other” cancer category (P ≤ 0.05), but not for the lung and breast cancer groups (P > 0.05). PCS differences had the strongest impact among patients with prostate cancer, followed by patients with skin cancer, and smaller for the remaining three types of cancer. After controlling for age, gender, education, and marital status, a one-point better (higher) PCS score was associated with 6% (relative risk [RR] = 1.060; 95% confidence interval [CI] [1.026, 1.094]) lower MEs in prostate cancer, 4% (RR = 1.045; 95% CI [1.021, 1.070]) lower MEs in skin cancer, and 1% (RR = 1.012; 95% CI [0.997, 1.026]) lower MEs in the remaining cancer groups (“other,” breast, lung). Age (P = 0.011) and marital status (P = 0.024) were significantly associated with MEs.

Table 3 Model parameters for final model linking PCS and medical expenditures, after inclusion of cancer types

The parameters of the final model linking MCS to MEs are displayed in Table 4. Age was significantly associated with MEs (P = 0.001). Overall, a one-point higher (better) MCS score was associated with approximately 2% (RR = 1.019; 95% CI [1.003, 1.034]) lower MEs and no meaningful/significant differences across cancer types were identified in the association between MCS and MEs.

Table 4 Model parameters for final model linking MCS and medical expenditures, after inclusion of cancer types

The parameters of the final model linking SF-6D to MEs are displayed in Table 5. Age (P = 0.001), marital status (P = 0.010), and health insurance (P = 0.004) were significantly associated with MEs. As in the case of the PCS model described previously, parameters for the lung and breast cancer groups were constrained to zero given their MEs did not differ significantly from those of the “other” group. For the SF-6D model, tests for model coefficients representing interactions between cancer groups and SF-6D utility score indicated that the association between the SF-6D and MEs was significantly different for patients with prostate cancer when compared to patients with the “other” cancer category (P = 0.003), but not for the remaining cancer groups (P > 0.05). SF-6D differences had the strongest impact in the prostate cancer group with a 0.05 point higher (better) SF-6D score associated with approximately 30% (RR = 1.230; 95% CI [1.160, 1.440]) lower MEs. In other cancer groups, a 0.05 point higher SF-6D score was associated with approximately 8% (RR = 1.080; 95% CI [1.025, 1.135]) lower MEs.

Table 5 Model parameters for final model linking SF-6D and medical expenditures, after inclusion of cancer types

Figure 2, corresponding to Tables 3, 4, and 5, depicts the association between PCS and MEs (Fig. 2a), MCS and MEs (Fig. 2b), and SF-6D and MEs (Fig. 2c) for the five cancer types, with the lower panels of Fig. 2 showing the smoothed distribution of PCS, MCS, and SF-6D scores over the range of predicted MEs. For PCS, the prostate cancer type has the steepest line (strongest effect of PCS on MEs), followed by skin cancer, while the curves for the remaining three cancer types are parallel on the log scale (indicating 1% lower ME for a 1-point better score across all three cancer groups). Differences in PCS score of the magnitude of the MID of three points established for the SF-12v2 [19] were associated with 3% lower MEs for patients in the “other,” “breast,” and “lung” cancer types, 12% lower MEs for “skin” cancer, and 17% lower MEs for “prostate” cancer. Differences in MCS of the magnitude of the MID (3 points) [19] were associated with 7.5% lower MEs, across all cancer groups. For SF-6D, an MID of 0.041 has been proposed [28]. This score difference was associated with 24.0% (95% CI [1.130, 1.349]) lower MEs for “prostate” cancer and 6.5% (95% CI [1.021, 1.110]) lower MEs for other cancers. The distribution curves in the lower panels of Fig. 2 indicate most patients with skin and prostate cancer were in the higher range of the PCS scale while patients in the lung cancer groups tend to be more concentrated in the lower PCS values (< 30). Differences across cancer groups in terms of MCS and SF-6D scores were similar to those observed for PCS scores, with patients with lung cancer more concentrated towards the lower end of the score range, whereas patients with skin and prostate cancer tended to be more concentrated in the higher values.

Fig. 2
figure 2

Estimated medical expenditures as a function of HRQoL and utility scores and scores distribution (bottom panels), by cancer type

Association between SF-12v2 scores and utilization frequency

Table 6 presents the parameters of the final model linking PCS and utilization frequency. The model included PCS, age, gender, education, and cancer type. Utilization frequency for breast and prostate cancer types did not differ significantly from those in the “other” category. Therefore, the parameters for these categories were constrained to zero in the final model. PCS was significantly associated with utilization frequency (P = 0.003), with a one-point higher PCS associated with 1% (RR = 1.012; 95% CI [1.004, 1.020]) lower frequency. The interaction between PCS and cancer type did not reveal significant differences in the effect of PCS on utilization across cancer groups (χ2 = 1.61; P = 0.4466). MCS was not found to be significantly associated with utilization frequency (P = 0.956). The model for the SF-6D (Table 7) indicated that age (P = 0.022) and education (P = 0.014) were significantly associated with utilization frequency. As in the case of the PCS model described previously, interaction terms indicated no significant differences in the effect of SF-6D on utilization frequency across cancer groups, and the effect of a 0.05-point higher SF-6D score was estimated to be approximately 4% (RR = 1.041; 95% CI [1.010, 1.073]) lower MEs among all cancer groups (Fig. 3).

Table 6 Model parameters for final model linking PCS and utilization frequency, after inclusion of cancer types
Table 7 Model parameters for final model linking SF-6D and utilization frequency, after inclusion of cancer types
Fig. 3
figure 3

Estimated utilization frequency as a function of physical component summary (PCS) and SF-6D utility score, by cancer type


This study found that PCS, MCS, and SF-6D scores from the SF-12v2 can be used to predict total MEs for patients with cancer. A one-point higher (better) PCS or MCS score was associated with approximately 2% lower MEs, while a 0.05 difference in SF-6D was associated with 9.5% lower MEs. Results also indicated that, for PCS and SF-6D index, the association with MEs varied by cancer type. For patients with prostate and skin cancer, a one-point higher PCS was associated with 6% and 4% lower MEs, respectively. For the remaining cancer groups (“other,” breast, lung), a one-point higher PCS was associated with 1% lower MEs. A 0.05-point higher (better) SF-6D score was associated with approximately 30% lower MEs among patients with prostate cancer, while for other cancer groups, a 0.05-point higher SF-6D score was associated with approximately 8% lower MEs. Additionally, PCS and SF-6D, but not MCS, can also be used to predict utilization frequency. A one-point higher PCS score was associated with a 1% increase in utilization frequency. Similarly, a SF-6D difference of 0.05 was associated with 4% decline in utilization frequency. We found no evidence that either of these associations varied by cancer group. Overall, these results are in agreement with previous studies examining the association between PRO scores and HCRU in patients with cancer [10,11,12,13,14, 16, 29, 30]. Similarly, a recent study found a significant association between health-utility scores (EQ-5D) and social care needs [31].

While the association between PRO scores and HCRU in MEPS cancer survivors has been previously studied [12], the current study established this association specifically in subjects with active disease and used a specific longitudinal link between the time of health status assessment and the period in which MEs were captured, i.e., approximately 6 months following the SF-12v2 administration. Thus, unlike prior studies, these results demonstrated that future HCRU could be predicted by using current HRQoL status and utility scores reported by patients.

While the MCS score was significantly associated with subsequent total MEs, it was not significantly associated with utilization frequency. Another MEPS study found [12], that the effect of psychological distress on total MEs among cancer survivors is largely driven by office-based expenditures and prescription expenditures, the latter of which was not included in this study’s measure of utilization frequency. Similarly, a previous study [10] regarding the associations between various symptoms and emergency room visits found that physical symptoms (appetite, drowsiness, nausea, pain, shortness of breath, tiredness) showed association with emergency visits, while anxiety and depression did not.

While this study represents a promising first step in a sparsely researched area, it is important to be aware of the study limitations. Although the sample size was sufficient to estimate mean expenditures (at 95% confidence level),Footnote 1 group comparisons were limited by sample size. Thus the results from subgroups analyses must be interpreted with caution. Future studies should seek to confirm such relationships in larger samples. In addition, the results may only be relevant to the non-institutionalized population, which is limiting for the target population of the study (patients with cancer not in remission).

In conclusion, this study aimed to improve interpretation of scores based on the SF-12v2, by translating differences in these scores to metrics familiar to medical decision makers, clinicians, and patients alike, such as MEs and utilization frequency. This information is expected to help guide interpretation of PRO data that are increasingly used in medical decision-making and health policy decisions. The policy-related implications of our research suggest that programs that improve the physical functioning, mental functioning, and HRQoL of patients with cancer could have a significant and measurable reduction in their MEs.