Introduction

In the global context of ageing populations who are likely to experience multi-morbidity, there is an increasing drive towards integrated models of care that bring together formal health services, social care provision, and community-based services to support local population needs [1, 2]. The dual aims of integrated care are to implement a person-centred approach for supporting people with complex care needs, and to ensure the sustainability of health systems over the long term. Economic evidence is an essential component of integrated care evaluation, but the evidence must be relevant to all involved stakeholders. For example in England, the economic impacts of integrated care initiatives (including pooled budgets) between Clinical Commissioning Groups responsible for health care planning and Local Authorities responsible for social care provision need to be assessed via outcomes relevant for both health and social care (e.g. well-being, independence).

Most existing outcome measures for economic evaluation, such as the EQ-5D [3], are based on the construct of Health-Related Quality of Life (HRQoL). HRQoL focuses mainly on functional status and symptom burden, for example the extent to which someone is physically mobile or experiences depression. Research highlights the limitations of this for understanding what matters to those living with long term conditions [4, 5]. Alternatively, Social Care Related Quality of Life (SCRQoL), as measured by the Adult Social Care Outcomes Toolkit (ASCOT) [6], focuses on wider aspects of well-being such as safety, personal comfort, and social participation. However, HRQoL and SCRQoL are each arguably too narrow for evaluating integrated health and social care interventions, which will simultaneously seek to minimise symptom burden (and associated use of health resources) and to maximise the efficacy of social support for ‘living well’ in its broadest sense.

The Long-Term Conditions Questionnaire (LTCQ) is a 20-item patient-reported outcome measure (PROM) designed to measure ‘living well with long-term conditions’ across health and social care domains. It enables patients to self-report on the overall impact of living with one or multiple long-term conditions (LTCs), including physical and/or mental health conditions of varying severity. LTCQ was initially validated amongst a diverse sample of health and social care users in England [7], demonstrating strong psychometric properties of internal consistency, construct validity, and test-retest reliability. It has since been validated for use in memory clinics by patients with mild cognitive impairment or dementia [8], and it is being used to evaluate clinical interventions for schizophrenia [9]. LTCQ’s person-centred construct of ‘living well’ captures outcomes that are relevant for integrated care initiatives, e.g. sense of control over daily life, safety, confidence to self-manage illness. Further psychometric analysis using Rasch modelling was undertaken to identify 8 candidate items for a Short-Form LTCQ (LTCQ-8), which correlates strongly with the 20-item version and maintains unidimensionality of its core construct [10].

The Social Care Institute for Excellence in the United Kingdom advocates three types of outcomes for evaluating integrated care: personal improvement of health and well-being (measured by PROMs), service improvement of care quality (measured by patient experience measures and process indicators), and value and sustainability of the system (e.g. cost-effectiveness and timely delivery of services to those in greatest need) [11]. A generic PROM such as EQ-5D is advantageous for its concurrent use in quality-of-life and cost-effectiveness analyses - the EQ-5D remains the National Institute for Clinical and Care Excellence’s (NICE) preferred measure of health-related quality of life [12] - but it is conceptually limited in capturing outcomes of importance for integrated care. The content of LTCQ is potentially more appropriate in this context, but LTCQ currently has no associated utility values for economic analyses. The aim of this work was to develop a mapping algorithm for estimating EQ-5D utilities from LTCQ-8 responses, thus increasing LTCQ’s potential as a comprehensive outcome measure for evaluating integrated care initiatives.

Methods

This mapping study has been conducted following the MAPS (MApping onto Preference-based measures reporting Standards) statement [13]. The approach was also informed by ISPOR guidance on mapping from non-preference based outcome measures [14]. The valuation of the EQ-5D-5L was based on the UK value set [15] and the cross-walk to derive EQ-5D-3L utilities [16] following the latest position of National Institute for Health and Care Excellence (NICE) [12].

Datasets

Three different datasets were used in this study. The first dataset was derived from the validation study of the LTCQ, where 1211 patients with at least one of 11 specified LTCs (cancer, chronic back pain, COPD, diabetes, depression, irritable bowel syndrome, ischemic heart disease, multiple sclerosis, osteoarthritis, severe mental health conditions including schizophrenia, stroke) were recruited from geographically diverse regions in England representing urban and rural communities, as well as areas of high and low deprivation [7]. Approximately three-quarters of the participants (health care cohort) were recruited through 15 primary care (GP) practices in 3 regions (South East, North West, Yorkshire and Humber), and the remaining quarter (social care cohort) were recruited through 4 Local Authorities (in North West, East of England, South West, and Greater London). In addition to the 20-item LTCQ participants completed several established PROMs including the EQ-5D (5-level version), the Disease Burden Impact Scale (DBIS) through which participants indicated the name and perceived impact of each long-term condition that they had, and a range of demographic questions [17]. The version of the DBIS used allows for up to 25 LTCs to be reported, each on an impact scale from 0 (does not have the condition) to 5 (high daily impact of the condition), for a theoretical maximum impact score of 125. From the 1211 participants, 37 were removed from the mapping study because of missing observations in the EQ-5D-5L questionnaire and 48 patients due to missing observations in LTCQ responses, leaving 1126 patients included in the estimation data set.

The second dataset (n = 115) included the baseline data from the Feeling Safe Study, a randomised controlled trial of a psychological intervention that recruited people with schizophrenia in South East England (Oxfordshire, Northamptonshire, and Berkshire) between February 2016 and July 2019 [9]. Three patients were excluded from the 118 patients in the Feeling Safe Study dataset because of missing LTCQ items. The third dataset (n = 93) stemmed from further validation work to test LTCQ’s psychometric properties amongst people with cognitive impairment, who were minimally represented in the original validation study. From the 102 respondents with dementia or mild cognitive impairment recruited in 14 memory clinics in South East England between February and August 2018, 9 were excluded because of missing observations in EQ-5D-5L and/or LTCQ.

In all three datasets, participants completed the full 20-item LTCQ. Responses for the 8 short-form items were extracted for the analyses below. A reduced number of LTCQ items facilitated model convergence, and these items were identified through modern psychometric methods as the best-performing, conceptually independent items for representing LTCQ’s general construct of ‘living well with long-term conditions’ [10]. As LTCQ-8 may be used as a stand-alone measure for larger-scale studies in the future, we based the mapping models on these items only (Table 1).

Table 1 The LTCQ-8 questionnaire

Estimation dataset and internal validation

In the LTCQ validation sample, only 7% of respondents reported a severe mental health condition, and only 9 respondents reported dementia. Considering that the LTCQ is an instrument for all long-term conditions, including mental health conditions and dementia, we combined the three datasets to ensure that the estimation and validation samples for EQ-5D utilities mapping were representative of the wider LTCQ population. The inclusion of the mental health dataset allowed us to map LTCQ to low EQ-5D utilities, taking account of the major impact that severe mental conditions such as schizophrenia have on HRQoL [18, 19]. The dementia dataset was included in the estimation dataset to include the utilities of older people with affected memory problems and complex needs.

In each of the three datasets, we randomly selected 75% of the sample and combined the selected random samples to generate the estimation dataset, which consisted of 1001 patients (845 from LTCQ, 86 from mental health, 70 from dementia). The unselected 25% observations from each dataset were combined to generate an internal validation dataset of 333 patients.

Statistical analysis

We used direct mapping models by regressing responses to the LTCQ-8 directly onto EQ-5D-5L and EQ-5D-3L utilities as well as response (or indirect) mapping to predict the response level that patients selected for each of the five EQ-5D-5L domains. The predictor variables in each fitted model were 32 dummy variables indicating whether or not a patient had a particular response on each LTCQ-8 question. We performed a complete case analysis because imputation introduces an additional source of error that can affect the precision of the algorithms [20]. Considering also the low missing data (i.e. 3%) in the estimation dataset, imputation was unlikely to affect the comparison of the mapping models. No patient characteristics (e.g. gender) were added as covariates in the best performing model because missing observations could reduce prediction accuracy. We also wanted to ensure that the mapping algorithm can be applied to datasets that do not include patient characteristics. All statistical analyses were performed in STATA version 15.

Direct mapping models

Ordinary least square (OLS) regression is the most frequently used mapping method [21]. However, several theoretical limitations have been highlighted in the context of HRQoL data [22]. Most particularly the assumption under OLS that the data is continuously distributed means the likelihood of having being in full health, a value of 1, is low, whereas in practice this is a relatively common response (10% of all responses considered in this study report perfect health). Despite this, recent work has shown that OLS performs well in mean prediction [23]. Tobit models have been used in EQ-5D mapping studies as an alternative to OLS to better deal with the bounded nature of EQ-5D utilities [24, 25]. Thus, the lower and upper limits for a Tobit model were specified as the minimum and maximum possible score on the EQ-5D-5L scale (i.e. -0.285 and 1 respectively) and on the EQ-5D-3L scale (i.e. -0.594 and 1 respectively). In the presence of heteroscedasticity, non-normality, and censoring, censored least absolute deviations (CLAD) models provide consistent estimates [23] and thus, it has been used extensively in HRQoL mapping literature [21, 26]. The limits and dependent variable in this model were specified as in the Tobit model. The observed EQ-5D-5L utility was used as the dependent variable in the OLS, Tobit and CLAD models.

Generalised Linear Model (GLM) allows for the skewed distribution of EQ-5D utility data and prevents prediction of utilities higher than 1 [27]. We specified two GLMs, one with Gamma family and log link and another with Gaussian family and log link, to predict EQ-5D-5L disutility. To fit these models, the dependent variable was transformed as 1 − Utility. Furthermore, fractional logistic regression was used to constrain utility predictions between − 0.281 and 1 for EQ-5D-5L utilities, and between − 0.594 and 1 for EQ-5D-3L utilities. The dependent variable in the regression with the EQ-5D-5L utility was

$$ {Utility}_{0-1}=\left( Utility+0.285\right)/1.285 $$
(1)

Two-part models have been used in utility mapping studies to allow for a relatively large proportion of observations reporting perfect health on EQ-5D (i.e. utility of 1) [22]. We therefore, specified two two-part models. In the first part of these models, a logistic regression was used to estimate the probability an individual to report perfect health (i.e. EQ-5D-5L utility = 1). The second part applied OLS regression in one two-part model and binomial beta regression in another two-part model to utilities less than 1. Binomial beta regression was performed because of its ability to deal with left and right skewed HRQoL data [28]. The dependent variable in the binomial beta regression was specified similar to the fractional logistic regression.

Response mapping models

Several models have been proposed in the literature to predict responses to EQ-5D dimensions [22], the most frequently used are multinomial and ordinal logistic regressions [21]. A multinomial logit and an ordinal logit were specified to estimate responses based on the expected value method [22, 29]. These regression models were applied to each dimension of the EQ-5D-5L and used to calculate the probabilities of responding to each of the five levels. The expected dis-utilities (inverse of utility) in each dimension were calculated by using the estimated response probabilities and the EQ-5D-5L UK valuation set [15].

Assessment of model performance

Predicted EQ-5D-5L utilities were estimated for each mapping model with back-transformation applied to the GLM models, fractional logistic regression, and binomial beta regression. For the two-part models, expected utility was estimated as:

$$ Utility=\Pr \left( Utility=1\right)+\Big(1-\Pr \left( Utility=1\right)\ast U $$
(2)

where U is the utility conditional on imperfect health (i.e. < 1) estimated in the second part of the models.

The mapping models were compared in terms of their ability to accurately predict EQ-5D-5L utility based on common metrics of predictive performance. Mean squared error (MSE) indicated goodness-of-fit and mean absolute error (MAE) measured individual-level prediction accuracy. These metrics were calculated in the estimation sample as well as in the two external validation models separately and jointly. Scatterplots with the observed and predicted EQ-5D-5L and EQ-5D-3L utility were created for each mapping model.

After the best performing model was selected, we investigated whether it performed equally well across different levels of morbidity and between males and females. This was done by plotting MSE across the deciles of the DBIS Score by gender. This was done using only the original LTCQ validation dataset because the DBIS Score was not available in the other two datasets.

Results

The mean EQ-5D-5L utility in the dataset from the validation study of the LTCQ was 0.616 (SD: 0.320). A similar mean utility was recorded in the baseline dataset from the Feeling Safe study among a sample with mental health conditions (mean: 0.605, SD: 0.219) and a higher utility recorded in the sample with dementia or mild cognitive impairment from the additional validation of the LTCQ (mean: 0.785, SD: 0.205). In our combined estimation sample of 1001 participants from across the datasets, the mean score was 0.623 (SD: 0.312). The mean DBIS score, only recorded for the full LTCQ validation sample, was 15.936 (SD: 12.814).

Table 2 provides the responses for the EQ-5D-5L and LTCQ-8 questions by dataset. Figure 1 provides the distribution of scores on the EQ-5D-5L and the EQ-5D-3L by dataset. This illustrates the bounded and skewed distribution of the utilities.

Table 2 Responses on EQ-5D-5L and LTCQ-8 questions by dataset
Fig. 1
figure 1

Distribution of EQ-5D-5L and EQ-5D-3L (using crosswalk) utility in each dataset

Direct mapping

Table 3 shows the performance of the various models when estimating EQ-5D-5L and EQ-5D-3L utilities within the estimation dataset. The two-part model with OLS was the best performing model based on the mean squared error (0.038) and mean absolute error (0.147) when estimating the EQ-5D-5L, as well for EQ-5D-3L utilities (MSE: 0.052, MAE: 0.172). The order of rank was largely consistent based on either the mean squared error or mean absolute error and between utilities. The mean squared error was consistently lower for the 5 L utility compared to the 3 L. Of the possible response mapping models, the multinomial logit was the better performing (MSE: 0.045; MAE: 0.155).

Table 3 Model performance based on the estimation sample (n = 1001)

Figure 2 shows the performance of the two-part OLS model as measured by the mean squared error across deciles of the DBIS scores. This confirms, as per Table 3, that the model performs better on the EQ-5D-5L compared to the 3 L. It also shows that model performance deteriorates among those with higher DBIS scores, representing high severity and/or multiple conditions, especially among males.

Fig. 2
figure 2

Mean Squared Errors across deciles of the DBIS score by gender based on the LTCQ validation dataset (n = 1126). Notes: Bars show the standard deviation of the mean Mean Squared Error in each decile; Dashed lines show the outer fence of 2 standard deviations from the mean Mean Squared Error in each decile; Dots show outliers with Mean Squared Error larger than one Standard Deviation of the mean Mean Squared Error in each decile

Table 4 provides regression coefficients for the best performing model, the two-part model with OLS. The results can be used to estimate the EQ-5D utilities based on a set of responses to the Short-Form LTCQ according to Eq. 2:

$$ Utility=\Pr \left( Utility=1\right)+\Big(1-\Pr \left( Utility=1\right)\ast U $$
(2)
Table 4 Regression coefficients for the best model: two-part with OLS

For instance, in a set of responses for an individual responding “living well” throughout the LTCQ, the probability of being in perfect health (utility of 1) is given by the sum of the constant and the coefficients for “living well” across each domain for part 1 – note, this needs to be transformed by taking the inverse of the logistic function. The predicted second-part utility is given by the sum of the constant and respective domain coefficients from the part 2 estimates. Finally, the predicted EQ-5D utility is given by substituting the above values into Eq. 2, producing a utility estimate of 0.93 for the EQ-5D-5L (see Appendix 2 for a detailed calculation).

Out-of-sample prediction

When we fitted the coefficients to the validation sample (n = 333), the mean predicted EQ-5D-5L utility from the two-part OLS model was 0.64 (SD: 0.25), compared to the recorded mean of 0.63 (SD: 0.30). The mean predicted EQ-5D-3L was 0.55 (SD: 0.28), compared to a recorded 0.53 (SD: 0.34). The mean squared error of our predicted value was 0.040 for the EQ. 5D-5L and 0.0582 for the EQ. 5D-3L; the mean absolute error was 0.1520 and 0.1848, respectively.

Response mapping

Appendix 3 reports the multinomial logit results for each of the EQ-5D questions with the 8 items of the short-form LTCQ as the predictor variables. The McFadden pseudo-R2 indicates that the models perform fairly well, particularly in the self-care, usual activity and anxiety/depression domains. The results were broadly intuitive. In several instances, the result for a given LTCQ-8 question most related to an EQ-5D domain is significant; for example, the LTCQ-8 question on feeling dependent is significant across a number of EQ-5D domains but particularly for usual activities and less so for anxiety and depression.

Discussion

This study provides a mapping algorithm for estimating EQ-5D utilities from LTCQ-8 responses. We have shown that our approach produces estimated utility scores that closely approximate what would have been recorded had the EQ-5D been administered, particularly for individuals outside of the most severe health states. The results from this study can help extend the applicability of the LTCQ by estimating utility values for use in economic analyses.

In this paper, we presented a broad range of potential models. Consistent with previous studies, a simple OLS model performed well for predicting average utilities, but the two-part model with OLS emerged as the best-performing model. The two-part model was proposed to better reflect the bounded nature of EQ-5D scores. The mean squared errors and mean absolute errors recorded in our models were similar to that reported elsewhere [30]. The small differences in performance between several models is consistent with research elsewhere [22, 31], as well as with the finding that the validity of mapping algorithms is more strongly a function of the particular instrument and disease or patient group rather than the specific technique employed [32].

We mapped our results onto both EQ-5D-5L and EQ-5D-3L utilities. The NICE position statement from November 2018 affirms the present need to use the EQ-5D-3L for reference-case analyses and recommends mapping to the EQ-5D-3L if EQ-5D-5L data had been collected [12]. EQ-5D-3L values in this study were thus derived using the recommended crosswalk [16]. Our models performed better on the EQ. 5D-5L than the 3 L, implying that the greater sensitivity afforded by the additional levels in the EQ-5D-5L helped achieve improved model fit.

In addition, we mapped a response model (multinomial logit) that indicates how responses to questions from the LTCQ-8 impact across the dimensions of the EQ-5D-5L. This mainly provided intuitive results, with responses to particular questions affecting appropriate domains. For instance, feeling more in control of daily life was significantly associated with being in a less severe state in the mobility domain. Although, there were also some seemingly counter-intuitive signs, for instance ‘always feeling safe at home’ was a positive and significant predictor of being in the most severe state in the mobility domain relative to the least severe. Albeit, this could reflect the person being in adjusted accommodation where they do feel safe. A larger validation dataset would help to clarify this.

Similarly, a validation dataset with more severe health states (lower utilities) would help assess model performance where it was weakest, although worsening predictive performance for patients in severe states is consistent with other research [30]. Similarly, since utilities are often required for specific subgroups of patients, further validation could occur with populations dissimilar from those featured in this paper.

Conclusions

As a result of the mapping algorithm produced in this study, surveys that include the LTCQ, or the short form LTCQ-8, can now be used to estimate utility values for economic analyses. Although the existence of a mapping algorithm is not an argument against the inclusion of direct preference-based measures in prospective studies - mapping is a second-best solution to direct measurement [33] - the ability to generate estimated utilities should increase the potential of the LTCQ for use as a comprehensive outcome measure for evaluating integrated care initiatives.