Introduction

Quality of life-related to chronic obstructive pulmonary disease (COPD) is impaired, and it deteriorates significantly with increases in severity in COPD patients1. In addition to causing disabilities, the disease generates high healthcare costs and heavy socioeconomic burdens2. In 2013, the COPD prevalence in people aged over 40 years was 6.1% in Taiwan3, and it was the seventh leading cause of death in 2018. The estimated loss of life expectancy in patients at moderate and severe stages of the disease was 6.2 and 9.4 years, respectively4.

COPD-specific health-related quality of life instruments, including the CAT (COPD Assessment Test), CCQ (Clinical COPD Questionnaire), and SGRQ (St George's Respiratory Questionnaire), were designed to reliably assess the impact of the disease on patients. The CCQ and CAT have the advantage of being simpler to administer5. Furthermore, the CAT has been introduced as a tool for differentiating the severity of COPD in patients6. The usage of this questionnaire has been also extended to many clinical studies and practice guidelines7. The short and self-administered eight-item questionnaire consists of coughing, phlegm, chest tightness, breathlessness, home activities, leaving home, sleep problems, and energy. Each item has six levels ranked on a scale ranging from 0 to 5. Therefore, the total CAT scores will range from 0 to 40.

The CAT cannot be used directly for quality adjustments in the measurement of quality-adjusted life-years in a cost-utility analysis8. Therefore, studies using mapping algorithms to estimate European Quality of Life-5 Dimensions (EQ-5D) utilities from the CAT were investigated9,10. However, overestimated utilities were reported in COPD patients at advanced stages of the disease using the previous mapping algorithms from CAT scores, developed by Hoyle et al. in the UK9. A similar situation occurred by using the mapping algorithms, created by Lim et al. in South Korea and even greater underestimations of utilities were found for mild COPD cases10.. The aim of this study was thus to explore a suitable mapping algorithm for COPD patients in Taiwan.

Methods

The Institutional Review Board of National Cheng Kung University Hospital (NCKUH) approved this study before commencement (IRB number: B-ER-98-289 and B-ER-111-254). Informed consent was obtained from all subjects, and all methods were carried out in accordance with the relevant guidelines and regulations of the research ethics committee.

In this study, 323 patients were enrolled who were diagnosed with COPD in the outpatient Pulmonary Medicine Clinic at National Cheng Kung University Medical Center from April 2017 to December 2020. All patients were enrolled in the pay-for-performance program of COPD and had been receiving regular medical treatment for COPD for more than three months. These COPD cases were defined according to the GOLD diagnosis guideline and criteria7. All pulmonary function tests were performed according to a joint consensus of the American Thoracic Society and the European Respiratory Society11. Patients who were unwilling to participate, unable to receive the pay-for-performance program (for example, bed-ridden), or had advanced lung cancer and pulmonary fibrosis were excluded.

Participants were classified using the GOLD 2017 classification and were divided into four stages (mild to very severe), which corresponded to the GOLD 2017 grades 1 to 4, based on the post-bronchodilator forced expiratory volume in one second (FEV1): grade 1 or mild stage (FEV1 ≥ 80%), grade 2 or moderate stage (50% ≤ FEV1 < 80%), grade 3 or severe stage (30% ≤ FEV1 < 50%) and grade 4 or very severe stage (FEV1 < 30%)7. In this study, the participants with FEV1 < 50% were incorporated into the “severe” category to obtain a sufficient number in the sample for estimation.

The quality of life of the COPD patients was consistently monitored with the EQ-5D-3L and the CAT in order to develop an algorithm for estimating EQ-5D equivalent utilities from the CAT. The Validated Taiwanese version of EQ-5D-3L and Chinese version of CAT questionnaires were used in this study12,13.

The questionnaires were administered by the case manager of the pay-for-performance program at the outpatient department.

Model development

The COPD patient dataset was randomly split into a training group of 160 patients and a validation group of 163 patients. While the predictive model was built, the coefficients of the final model were derived based on the full sample (all 323 patients) to get the most accurate estimates. In this study, we considered two OLS-based procedures to build predictive models for COPD patients. The first was the model recommended by Hoyle et al.9. They regressed the EQ-5D utility on 8 CAT scores and chose 4 CAT scores (chest tightness, activities, confidence, and energy) with p-values smaller than 0.05 to build the final model. We created modified versions of the models by Hoyle et al. and Lim et al. that better fit the Taiwanese population. A backward elimination procedure was applied to obtain a final parsimonious model with a type I error rate of 0.05 when statistical hypothesis testing was performed.

Response mapping is another feasible approach that can be used for utility prediction14,15. While OLS is aimed toward predict EQ-5D utility, response mapping is aimed toward predict five EQ-5D scores, each taking values of 1, 2, or 3. The five predicted scores are then transformed to the utility. The transformation formula varies across different underlying populations. In this study, the formula, which is based on Taiwanese population obtained from Lee et al. was applied16. The formula was: EQ-5D-3L utility = 1–0.185–0.123*Mobility at level 2–0.272*Mobility at level 3–0.167*Self-care at level 2–0.276*Self-care at level 3–0.085*Usual activities at level 2–0.208*Usual activities at level 3–0.121*Pain/discomfort at level 2–0.261*Pain/discomfort at level 3–0.154*Anxiety/depression at level 2–0.282*Anxiety/depression at level 3–0.190*Any dimension on level 3.

Furthermore, because the EQ-5D score takes discrete values, a multinomial logistic regression (MLR) is a relevant model to use to identify the association between the score and covariates, 8 CAT scores, age, and sex. In our dataset, most of patients filled out both the EQ-5D and CAT questionnaires multiple times during the follow-up period. In other words, the experiment consisted of repeated measurements. Therefore, a generalized estimating equation (GEE) was applied with an independent working correlation for estimation and hypothesis testing17. For each EQ-5D score prediction, the final multinomial logistic regression was chosen so that the resulting QIC was minimized18. The GEE models were performed using the SAS GEE procedure.

We also applied the Mean Rank Method (MRM), developed by Wee, et al.19, as the other method for developing a predictive model of mapping EQ-5D utilities from CAT. The MRM considers nonparametric matching among EQ-5D and CAT scores, preventing potentially erroneous model assumptions and providing less interpretation information.

Validation

Among all applied methods, OLS, MLR, and MRM, the training data was used to build predictive models, whereas the validation data was used to evaluate the performance of these models via both the root mean squared error (RMSE) and mean absolute error (MAE). In addition, the models of Hoyle et al. and Lim et al. were modified by re-estimating their coefficients using the training group and validated by the validation group for comparison.

In this study, as expected, the model with the best predicting ability should have the smallest RMSE and MAE. Additionally, to visualize the potential prediction biases, we suggest the bubble chart drawn with R version 4.2.1statistical software, R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ in this paper. The best model should locate a majority of bubbles on the diagonal line of the bubble chart. As for statistical comparisons among groups, continuous variables were analyzed by t-test, and the categorical variables were analyzed by chi-square. All statistical analyses were conducted by using SAS version 9.4 statistical software (SAS Institute Inc., Cary, NC, USA).

Ethics approval

The Institutional Review Board of National Cheng Kung University Hospital (NCKUH) approved this study before commencement (IRB number: B-ER-98-289 and B-ER-111-254).

Results

In this study, 323 patients were recruited, and 2327 repeated measures were done. The mean number of visits for each patient was seven. The range of visit frequency was from one to sixteen times. Overall, the mean EQ-5D-3L utility and CAT total scores data were 0.917 and 9.88, respectively. Between the training and the validation group, the EQ-5D-3L utility and CAT total scores were similar at baseline. No differences were revealed over the distribution of patient characteristics (supplementary Table 1).

A negative correlation (-0.69) between EQ-5D-3L utility and the total CAT score was observed (Fig. S1). The distribution of the EQ-5D-3L utility and CAT scores from all eligible patients are presented in Fig. S2. The largest cluster was located at EQ-5D utility = 1: n = 1624 (69.79% of observations). The other clusters were 0.5 ≤ EQ-5D < 1: n = 638 (27.42%) and 0 ≤ EQ-5D < 0.5: n = 65 (2.79%).

Model development

Two MLR models were produced from training group using either the total CAT score or selected CAT items, the coefficients of the final model were derived based on all 323 patients to get the most accurate estimates (supplementary Table 2). The formula for developed models in this study, including the models developed by Hoyle et al. and Lim et al. and their modified versions are presented in Table 1. Estimated utility scores were presented with mean, maximum, and minimum scores for each model with subsets of the EQ-5D utility values, CAT scores, or FEV1 stages. The RMSE and MAE were calculated for each subset to examine the distribution of errors across the different disease severities (Table 2). An overestimation of the mean EQ-5D was presented among the patients categorized as having poor health (0 ≤ EQ-5D < 0.25 and 0.25 ≤ EQ-5D < 0.5) in both models.

Table 1 Formula for current developed models.
Table 2 Comparison of predicted utility scores using the various models.

Comparing with the models developed by Hoyle et al., Lim et al. and Wee et al., non-inferior predictive effectiveness over the total and selected CAT items was found based on the MAE and RMSE results (Table 3). According to the real-predictive bubble charts, the method developed for the purposes of this study revealed comparable predictive capability to the other models.. Based on these bubble charts, all of the models had good predictive accuracy for the COPD patients with a better health status (Fig. 1). Figure 1a showed the real-predictive bubble chart using the formula developed in this study. The selected CAT items model led to obtaining a more precise prediction than the total CAT score model. The accuracy of the predictive model was better in the case of patients with higher EQ-5D utility. Figure 1b showed the real-predictive bubble chart using the formula recommended by Hoyle et al. applied to Taiwan datasets, and Fig. 1c showed the predictive EQ-5D utility with the model of modified Hoyle et.al., the model was developed by using the CAT score in this study and equations based on OLS (ordinary least square) method from Hoyle et.al. The model with the M6_OLS equation showed overestimation for lower utility (utility ≤ 0.5) patients and underestimation for near health (utility > 0.9) patients. The model with the M3_OLS equation had better predictive power than the M6_OLS equation for Taiwan datasets, but overestimated lower EQ-5D (utility ≤ 0.5) patients. Figure 1d showed the real-predictive bubble chart presenting the predictive EQ-5D utility with the formula recommended by Lim et al. for Taiwan datasets, and Fig. 1e showed the model of modified Lim et.al.. Both models showed that better predictive effectiveness was reported with the CAT items equation for patients with higher EQ-5D utilities. However, poor predictive power with overestimation was found for patients with lower EQ-5D utilities (utility ≤ 0.6) in the models with the CAT total scores and CAT items equations. Figure 1f showed the real-predictive bubble chart presenting the predictive EQ-5D utility with the mean rank method, recommended by Wee et al. for Taiwan datasets. The accuracy of the predictive model was similar to the developed model in this study (Fig. 1a). The overestimation for low utility and underestimation for near health patients in models developed by Hoyle et al. and Lim et al. was improved in the presented model and model developed by Wee et al.

Table 3 Comparison of predicted utility scores using the current developed models.
Figure 1
figure 1figure 1

Bubble chart for actual and predicted utility. The real-predictive bubble chart presenting the distribution of the actual EQ-5D-3L utility with its predictive value based on the developed models. These charts revealed predicted utilities on the X axis and observed utilities on the Y axis. The bubble sizes and colors depict the predicted number of actual samples, where bigger bubbles mean a larger sample size. The colors of small, medium and large size bubbles were blue, pink and yellow color, respectively. When more bubbles are located adjacent to or on the diagonal line, this indicates a higher EQ-5D value prediction. Afterwards, acceptable fit requires that a greater number of large bubbles are located within a suitable margin along the diagonal line. (a) The real-predictive bubble chart using the formula based on CAT total scores and CAT items developed in this study. (b) The real-predictive bubble chart using the formula recommended by Hoyle et al. was applied to Taiwan datasets (M6_OLS and M3_OLS). (c) The real-predictive bubble chart presenting the predictive EQ-5D utility with the Model of modified Hoyle et.al., the model was developed by using the CAT scores in this study and equations based on OLS (ordinary least square) method from Hoyle et.al (M6_OLS and M3_OLS). (d) The real-predictive bubble chart presenting the predictive EQ-5D utility with the formula using the total CAT scores and CAT items recommended by Lim et al. for Taiwan datasets. (e) The real-predictive bubble chart presenting the predictive EQ-5D utility with the Model of modified Lim et.al., the model was developed by using the total CAT scores and CAT items in this study and equations based on OLS (ordinary least square) method from Lim et.al. (f) The real-predictive bubble chart presenting the predictive EQ-5D utility with the mean rank method, MRM recommended by Wee et al. for Taiwan datasets.

Discussion

In this study, backward elimination with response mapping was carried out to develop the formula by which to transform the CAT scores into an EQ-5D score. Subsequently, estimating the utility weights of the EQ-5D was implemented with time trade-off method based on the EQ-5D scores. The performance of selected CAT items model was better in terms of estimation effectiveness than CAT total scores model.

The RMSE and MAE are standard measurements for choosing the best model among several models. Therefore, both the RMSE and MAE were used to evaluate their predicting ability for developed models in this study. Compared with the models in Hoyle et al.9 and Lim et al.10, comparable predictive effectiveness over selected CAT items was found based on the RMSE and MAE results.

However, previous models severely overestimate the EQ-5D utilities when the true utilities are relatively low9,10. By contrast, the model proposed in this study was more appropriate for the patient with COPD in Taiwan. From the bubble charts, overestimated EQ-5D utilities for the actual low utility value (< 0.5) were more distinct using the previous models within the datasets.

In the case of the model recommended by Holey et al., overestimated EQ-5D utility was reported for the COPD patients with poor health (utility < 0.7) or the extremely severe cases (CAT: 31 ~ 40). Also, underestimation of EQ-5D utility was noted in COPD patients with near health status (utility > 0.9). In the model proposed by Lim et al., there was also overestimation in predicting EQ-5D utility reported in patients at the very severe stage of COPD, and underestimation of the EQ-5D utility was reported in COPD patients at the mild stage.

In literature, OLS based approaches are popular for mapping disease-related measurements onto the EQ-5D utility9,10. The OLS algorithm using the CAT profile, the M3_OLS, was recommended by Hoyle et al. In a comparison with the MLR algorithm in this study, a higher RMSE was reported for the MLR model after adjusting the coefficient of the OLS algorithm using the data from this study. However, a lower MAE was noted when using the MLR model for the datasets in the present study. Another mapping algorithm for EQ-5D-3L utility prediction of COPD patients, the OLS1 and OLS3 models recommended by Lim et al., were also applied to the datasets in the present study. An even higher RMSE, but a lower MAE, were reported using the MLR model even after adjusting the coefficient of the OLS algorithm with the data from this study.. When ranking the mapping algorithms based on the MAEs and RMSEs, the model developed for this study was found to be comparable with developed models by Holey et al. and Lim et al., However, when comparing our model with models from Holey et al. and Lim et al. by the real-predictive bubble charts, our model had better predictive effectiveness among patients with poor utility and near health in this study t.

This model recommended in the present study exhibited better predictive power than other models in terms of mapping EQ-5Q utility from the CAT for COPD patients in Taiwan. However, overestimated EQ-5D utilities were still observed for patients with poorer health status (actual utility < 0.5), because of small sample size. Therefore, the large prediction bias may have been due to the small sample size of very severe COPD patients or due to the heterogeneity arising because of COPD severity. This phenomenon calls attention to the fact that choosing the best model by merely considering one or two indices may result in an unexpected result.

The time trade-off values of coefficients in estimating quality weight of EQ-5D health states differ from country to country20,21. This difference might be due to the sociodemographic background of the respondents and methodological differences in studies21. And then, the difference could contribute not only one appropriate model for developing mapping algorithms. For example, the respondents in South Korea put more weight on mobility and self-care domains than the other three dimensions22. For UK respondents, the pain/discomfort domain was considered to be more important than the other four dimensions23. As for Taiwanese, the respondents devote their mind to the anxiety/depression domain more than others16.

Apart from the MAE and RMSE results, according the real-predictive bubble charts, the method developed from this study reveals comparable predictive capability to the models developed by Hoyle et a. in UK and Lim et al. in South Korea. The models from the UK, South Korea, and Taiwan group all presented well accuracy of prediction over the COPD patients with better health status. By contrast, poorer predictive performance was revealed under the models of UK and South Korea in the COPD patients with poorer utility and near health than the present models. We have tried to apply MRM, developed by Wee et al. as the other method for developing a predictive model of mapping EQ-5D utilities from CAT in this study. The predictive ability of the MRM model was better than the models of UK and South Korea and was similar to the present models base on real-predictive bubble charts. Therefore, other methods even developed not for mapping EQ-5D utilities from CAT in original, should be tried in the future to get the best predictive model for different populations.

Conclusions

Response mapping with MLR model and model using MRM method has comparable performance with OLS model for predicting EQ-5D utility from CAT in Taiwan. In addition, the overestimation for low utility patients and underestimation for near health in previous developed OLS models was improved in the presented models and model using MRM method. However, it is better to administer both CAT and EQ-5D-3L if the cost-utility analysis is planned for clinical trial or study; the mapping should be the last resort as it can only give an approximate utility value.