Plain English summary

Deciding which healthcare interventions to invest in can be challenging, especially when considering the diverse impacts of different health conditions. For young people with life-long conditions like cerebral palsy (CP) who cannot walk independently, their health impacts differ from those with conditions like asthma. To help in decision-making about where to invest, health economists have developed and employed an outcome metric called “quality-adjusted life years” (QALYs). QALYs are usually measured using questionnaires that assess health-related quality of life and are scored using special mathematical formulas that take into account what the person values in different areas of life that the questionnaire measures. However, these questionnaires are not commonly used in CP research. This study searched for statistical equations to convert scores from the Caregiver Priorities and Child Health Index of Life with Disabilities (CPCHILD) questionnaire, which is a commonly used instrument for children with CP who cannot walk by themselves, into Child Health Utility 9D utilities for QALY calculation. Two proposed equations using CPCHILD total score and selected CPCHILD domain scores can help estimate QALYs in the CP population. This may help in resource planning, when only the CPCHILD score is available.

Introduction

Quality-adjusted life years (QALYs) are the preferred health outcome metric used by national reimbursement agencies, such as the Pharmaceutical Benefits Advisory Committee (PBAC) in Australia and the National Institute for Health Care Excellence (NICE) in the UK [1, 2]. The key advantage of QALY(s) is their ability to measure both quantity and quality of life, which allows for a comprehensive evaluation of health interventions. QALYs enable policy makers to compare health benefits across various conditions and interventions, facilitating resource allocation decisions within public healthcare systems.

The premise of QALYs is that the number of years lived with a specific health condition is “weighted” for the quality of life associated with that condition during that time. The weights or utilities are usually measured using instruments that assess various health-related quality of life dimensions. The instruments, which are presented in a multiple-choice format, are known as multi-attribute utility instrument (MAUI). Individual’s responses to the MAUIs are then converted by using algorithms into a numeric index bounded by 0 (being death) to 1 (full health) reflecting individual’s preferences with respect to various domains measured by the MAUIs.

Measuring QALYs in children with cerebral palsy (CP) is important because this population needs comprehensive and specialised healthcare services including medical interventions, rehabilitation, assistive devices, and ongoing support. CP is a group of life-long movement and/or posture disorders resulting from injury or insult to the developing foetal or infant brain [3]. A systematic review reported that the medical costs for children with CP worldwide were 10–26 times greater than those for children who are typically developing with a positive relationship between expenditure and the severity of gross motor function [4].

Measuring QALY in children with CP presents challenges due to limitations in available MAUIs. The recently developed CP-specific instrument, the Cerebral Palsy 6 Dimensions (CP-6D), has not yet been validated in children with CP [5]. Moreover, generic instruments, such as Health Utilities Index Mark 3 (HUI-3), the Assessment Quality of Life-4D, and the EQ-5D-3L, may have limited sensitivity in children because they were adapted from instruments for adults [6]. The widely used generic MAUI, i.e. EQ-5D-Y, may not fully capture the multidimensional aspects of health-related quality of life that are specific to children with chronic neuromuscular conditions [7]. The absence of value sets for English version of EQ-5D-Y further restricts its application [8]. While Ryan et al. [9] proposed Child Health Utility 9D (CHU9D) as a superior alternative to EQ-5D-Y for measuring utility values in children with CP due to its better alignment with their experiences and avoidance of extreme values, its routine used in CP studies remains limited.

To facilitate the QALY measurement in children with CP where the valid and reliable non-MAUI instruments such as the Cerebral Palsy Quality of Life questionnaire (CPQoL) and the Caregiver Priorities and Child Health Index of Life with Disabilities (CPCHILD) are more frequently used, there was an attempt to map the CPQoL children version onto CHU9D [10]. However, the algorithms were developed from mostly young participants with mild-to-moderate severities of gross motor function. There is still a lack of QALY measurement for the severe or non-ambulatory group. The aim of this current study was to develop mapping regressions converting scores of the CPCHILD, a specific instrument designed for non-ambulatory children with CP, onto the CHU9D utilities. These regressions will allow the use of the current form of the CPCHILD for QALY estimation when only the CPCHILD is available, aiding economic evaluations and healthcare decision-making in this population.

Methods

This study followed the ISPOR good practices for mapping studies [11] and the Mapping onto Preference-based Measure Report Standards (MAPS) checklist [12].

Design and participants

A cross-sectional survey was conducted in 2019–2022, involving parents/caregivers of young people aged 5–18 years with a diagnosis of CP and functioning at Gross Motor Function Classification System (GMFCS) level IV or V, or non-ambulatory levels. In most instances, previous mapping studies, not limited to children with CP, have used convenience sampling approaches, with sample sizes ranging from 60 to 12,967 [13]. Considering the population size and challenges in participant recruitment for CP studies, the current study also employed a convenience sampling approach. The previous mapping study that focused on children with CP had a sample size of 76 [10]. For the current study, a target sample size of 150 participants was determined based on a total of 676 potential participants identified from the Victoria Cerebral Palsy Register (VCPR) with an expected response rate of 15% derived from a previous Australian study [14]. The VCPR is a population-based register of all young people born or living with CP in the Victorian state of Australia.

The study invitation was sent to the total of 732 participants including all eligible parents/caregivers identified from the VCPR and to 56 identified through attendance at orthopaedic outpatient clinics at the Royal Children’s Hospital, Melbourne. Follow-up phone calls were made to 426 potential participants who did not opt out after receiving the invitation. Participants whose child had died, had limited English, were unreachable by phone, or were under complex case management were excluded from the study.

The survey responses were collected and data managed using the secure, web-based Research Electronic Data Capture (REDCap) [15]. Ethics approval was obtained from the Royal Children’s Hospital Human Research Ethics Committee (HREC# 37238) and Deakin University Ethics Committee (Ref 2019-080). Consent was implied by completion of the survey.

Measures

The CPCHILD is a widely recognised and utilised instrument in the field of CP, measuring the caregiver’s perspectives of health status, function, comfort, well-being, and ease of caregiving for non-ambulatory children aged 5–18 years functioning at level IV and V on the GMFCS [16]. The GMFCS classifies motor function in CP, ranging from level I (independent walking) to level V (children are dependent on wheeled mobility and have limited head and trunk control). The CPCHILD has shown to be one of the strongest outcome measures with the strongest psychometric properties and clinical utility for children with CP [17, 18]. The CPCHILD questionnaire gathers information pertaining to the previous two weeks. It includes 37 items across six domains: (1) Personal care/activities of daily living, (2) Positioning, transferring and mobility, (3) Comfort and emotions, (4) Communication and social interaction, (5) Health, and (6) Overall quality of life. This study used the CPCHILD parent version 5.

The CPCHILD has a complex structure where each of the six domains has different scoring sections. In domains 1 and 2, the scoring sections evaluate the degree of difficulty in performing activities and the level of assistance required. The scores range from 0 (no problem at all) to 6 (not possible) for ‘difficulty’ and from 0 (total assistance) to 3 (independent) ‘level of assistance’. Domain 3 focuses on rating the frequency and intensity of pain or discomfort during certain daily activities, ranging from 0 (none of the time) to 5 (all the time) for ‘frequency’ and from 0 (severe) to 3 (none) for ‘intensity’. Items in domain 4 rate only the level of difficulty, while domain 5 includes questions asking about health resources used, such as doctor and hospital visits (0 for admitted > 7 days to 5 for none), overall health of the child (0 is very poor to 5 is excellent), and number of medications (0 for ≥ 5 medications to 5 for no medication). Domain 6 consists of a single item rating the child’s overall quality of life, ranging from 0 (very poor) to 5 (excellent) [16].

The CPCHILD total score is derived by averaging standardised scores of each item, while the domain scores average standardised item scores of relevant items. The standardised item score is computed from a sum of categorical rating responses of an item divided by the maximum possible raw item score. The CPCHILD manual provides more details about the scoring of CPCHILD [19]. Despite the complex structure of CPCHILD, its scoring should not raise any challenge for mapping.

The CHU9D is a generic MAUI designed for children and adolescents aged 7–17 [20]. Xiong et al. [21] reported that the CHU9D can be used for children between the ages of 5 and 18, with parent-report for children aged 5–7 years and self-report for children aged 8–18 years. It consists of nine questions that assess child’s worry, sadness, pain, fatigue, annoyance, schoolwork, sleep, daily routine, and ability to participate in activities. Caregivers provide their views on their child over the preceding 24 hours using a 5-level response rating, ranging from 0 (no problems) to 5 (severe problem) [22]. The CHU9D can be completed by either children themselves or a proxy [21]. In this study, a proxy-report was used because many non-ambulatory children with CP were unable to complete questionnaires by themselves. The CHU9D utilities were estimated using the Australian adolescent-specific algorithm [23], which generates utilities between − 0.1059 and 1 [24].

Statistical analysis

Spearman’s correlation coefficients were determined to explore relationship and overlapping domains between the CPCHILD and CHU9D. A correlation coefficient > 0.7 was considered strong correlation, 0.5–0.7 moderate, and < 0.5 weak [25].

There are two ways to undertake mapping studies, either directly or indirect. Direct mapping directly transfers the total score, selected domain scores, or item scores of the source instrument to predict the utilities of the target instrument. Indirect mapping predicts the response ratings of each item of the target measure based on item or domain data of the source instrument. This study employed direct mapping for two main reasons: it does not require a large sample size and the item structure of the CPCHILD is complex.

The ISPOR guidelines suggested that demographic information be included as predictors, so two core models were initially considered: Predictor set 1—CPCHILD total score, age, gender, GMFCS level, and self-perceived economic status, and Predictor set 2—selected CPCHILD domain scores, age, gender, GMFCS level, and self-perceived economic status. However, the self-perceived economic status was excluded to enhance the applicability of the mapping algorithms, since it is unlikely that other datasets contain this information. Non-statistically significant predictors (p > 0.05) were also excluded from the models to achieve the robust combination of predicting variables [26]. Due to our estimated sample size, this study did not consider the mapping algorithm based on CPCHILD item scores.

To investigate the distribution of CHU9D utilities, a Shapiro–Wilk test for normality was conducted. This information informed the choice of regression methods. Three regression techniques were considered: (1) ordinary least square (OLS), a widely used technique in mapping research [27]; (2) robust MM-estimator, a model suitable for small-sample efficiency [28]; and (3) generalised linear model (GLM) with specifying priori distributions. Four combinations of Gaussian and Gamma families, along with log and logit links, were explored. OLS and GLM have been widely employed in previous mapping studies [27].

Assessment of predictive accuracy

Model performance was assessed using mean absolute error (MAE) and the concordance coefficient correlation (CCC). MAE is a commonly used indicator in mapping studies [27, 29] and calculates the average absolute differences between the observed and predicted CHU9D utilities. On the other hand, CCC [30] measures the agreement between the observed and predicted CHU9D utilities. A smaller MAE and the larger CCC values indicate a better predictive ability of the mapping algorithm. Additionally, the differences between the predicted and observed means of CHU9D were also considered to identify the best and second-best mapping algorithms.

Model validation

The validation process involved two internal validation approaches due to the absence of an external dataset. The first approach (Validation 1) is a 5-fold internal cross-validation where the full sample was randomly divided into five equal-size sub-groups. In each iteration, four of the five sub-groups served as the “estimation sample” to develop the mapping algorithms, while the remaining sub-group was used as the “validation sample” to assess the performance of the algorithms. This process was simulated five times, with each sub-group serving as both estimation and validation samples. The average MAE and CCC across the five iterations were computed. In the second approach (Validation 2), the mapping algorithms generated from the full sample were tested on two computer-generated random samples. These random samples had sample size of 80 and 50 [31].

All statistical analyses were conducted in STATA SE16 [32].

Results

Sample and descriptive data

The total of 108 participants returned the completed surveys for analysis. Figure 1 provides a flow chart illustrating the recruitment and participation of the participants.

Fig. 1
figure 1

Participant flowchart

Table 1 presents characteristics of the participants, including their scores on the CPCHILD and CHU9D instruments. The age of the children ranged between 5 and 18 years, with mean age of 12 years. Sixty-one percent were male and 48% functioned at GMFCS level V. The mean CPCHILD total score was 45.38 out of 100. The standardised score of each CPCHILD domain ranged from 31 for the Positioning, transferring and mobility domain to 70 for the Comfort and emotions domain, with higher scores reflecting better health or easier care. The mean of the observed CHU9D utilities was 0.49. The distribution of CHU9D utilities was normal based on Shapiro–Wilk test (p-value 0.52). Supplement 1 shows histograms depicting the observed CHU9D utilities, CPCHILD total scores, and CPCHILD domain scores.

Table 1 Characteristics of participants

Table 2 presents the correlation coefficients of the CPCHILD domains and CHU9D utilities. The CPCHILD total score, CPCHILD Comfort and emotion, and CPCHILD overall quality of life score had moderate and significant correlations with the CHU9D utilities, with correlation coefficients of 0.56, 0.53, 0.50, respectively. In contrast, the Personal care/activities of daily living and the Positioning, transferring and mobility domains showed weak correlations with CHU9D utilities, with correlation coefficients below 0.35. These two domain scores were excluded from the predictor set 2 based on their weak correlations.

Table 2 Correlation coefficients between CHU9D utilities, CPCHILD total score, and CPCHILD domain scores

Final model predictors

Age, gender, and GMFCS levels were not significant predictors and did not improve the predictive performance of the model. Therefore, these variables were excluded from all predictor sets. The remaining predictor(s) for Predictor set 1 was the CPCHILD total score, Predictor set 2 were the standardised domain scores of CPCHILD comfort and emotional, communication and social interaction, health, and quality of life that were statistically significant.

Mapping regressions

Table 3 presents all measures of model performance using the full sample. Means of predicted CHU9D utilities were close to the observed utilities across all models. However, all regressions overestimated the lowest boundary and underestimated the highest boundary of the CHU9D utilities, resulting in the narrow range of predicted CHU9D than the observed values. The closest to the observed minimum utilities was the MM-estimate using Predictor set 2 (0.001 vs 0.058, Column 6). The highest predicted CHU9D utility was estimated from the GLM estimate with Gamma family and log link function using Predictor set 1 (0.983 vs 0.989, Column 7). Absolute differences between the predicted and observed CHU9D utilities were generally small across all regressions, except for the MM-estimator of the Predictor set 2.

Table 3 Mapping model performance using full samples

For the Predictor set 1, the GLM estimate with gamma family and logit link function outperformed other models based on its lowest MAE and highest CCC in both the full sample and cross-validation (Supplement 2) analyses. The second-best performing model in both the full sample and cross-validations was the MM-estimate. It had the second lowest MAE and a wider predictive range compared to the GLM estimate with Gaussian family and logit link (Table 3, Supplement 2).

For the Predictor set 2, the GLM estimate with gamma family and logit link function using scores from the CPCHILD comfort and emotion, health, and quality of life domains scores performed better than the other regression models. The second algorithm of choice was the GLM gamma family and log link using scores from comfort and emotion, health, and quality of life. Supplement 3 presents scatter plots that showcase the relationship between the observed and predicted CHU9D utilities of all mapping regressions.

Table 4 shows detailed regression results using the full sample. Algorithms of the most preferable mapping functions for each predictor set were presented as follows:

$$Predictor\,Set\,1\quad {\text{predicted}}\,{\text{CHU}}9{\text{D}}\,{\text{utilities}}\,\, = \,\,{\text{ln}}\frac{1}{{1 - \left( { - \,2.1839 + \left( {0.0471 \times {\text{CPCHILD}}_{{\text{total score}}} } \right)} \right)}}$$
$$Predictor\,set\,2\quad {\text{predicted}}\,{\text{CHU}}9{\text{D}}\,{\text{utility}}\, = \,\ln \left( {\frac{1}{{1 - \left( { - \,2.2743 + \left( {0.0112 \times {\text{CPCHILD}}_{{{\text{CE}}}} } \right) + \left( {0.0120 \times {\text{CPCHILD}}_{{\text{H}}} } \right) + \left( {0.0130 \times {\text{CPCHILD}}_{{{\text{QoL}}}} } \right)} \right)}}} \right),$$

where CPCHILDCE is the CPCHILD Comfort and emotion standardised domain score, CPCHILDH is the CPCHILD Health standardised domain score, and CPCHILDQoL is the CPCHILD Quality of life standardised domain score.

Table 4 Regression outputs

Discussion

Mapping algorithms play a crucial role in estimating health-related quality of life when MAUI is not administered. In the case of non-ambulatory children with CP, deriving CHU9D utilities from the CPCHILD is a valuable method for estimating the QALYs gain or loss resulting from interventions. Findings of this study can be used by researchers and clinicians who are interested in assessing utility values, providing valuable insights into healthcare decision-making and resource allocation in the context of cost-utility analysis.

The development of mapping algorithms adhered to the ISPOR mapping guidelines [11]. These guidelines provide a framework for the systematic development and validation of mapping algorithms. Moreover, this mapping study was reported in accordance with the MAP recommendations [12] to ensure transparency and consistency in reporting mapping process.

CPCHILD specifically focuses on children with greater physical impairments. It measures the health-related quality of life associated with functional limitations and well-being of these children from the perspective of their parents/caregivers. While some questions within the CPCHILD assess the ease of caregiving, the level of caregiving required is intrinsically linked to a child degree of impairment, which in turn has a direct influence on their well-being. This connection is further highlighted by a recent meta-analysis demonstrating a strong positive correlation between physical ability and quality of life in young individuals with CP [33]. This emphasises the importance of considering the caregiving aspect when evaluating quality of life of this population.

The relationship between physical impairment and quality of life is also evident in this study. Our study population reported substantially low quality of life. In particular, the average CHU9D utilities were almost half that of general young Australians (0.49 vs 0.78) [34]. The physical and/or psychological health of the parents/caregivers may affect the caregiving. However, this study did not investigate these impacts.

The selection of model predictors was based on careful consideration. The decision to exclude the CPCHILD Personal care/activities of daily living and Positioning, transferring and mobility domains was made due to their weak correlations. However, the CHU9D and CPCHILD total scores exhibited a moderate correlation. Moderate correlations were also observed between the CHU9D and CPCHILD Comfort and emotion and Quality of life domains (ρ 0.53 and 0.50, respectively). Correlation of CPQoL and CHU9D in the previous mapping study was 0.46 [10]. The moderate correlations are commonly observed in the mapping literature, as supported by previous studies [10, 26, 31].

All models overestimated the minimum value of CHU9D utilities and underestimated the maximum values, making the range of predicted CHU9D utilities narrower than the observed values. The narrow ranges of predicted values may raise concerns about the applicability of these mapping algorithms. However, this issue is not unique to this study, but has been evident in other mapping studies [10, 35, 36]. While Sharma et al. [35] attributed the narrow predictive range to the non-normal distribution of CHU9D utilities, the CHU9D utilities of our study are normally distributed. The relatively high MAEs (around 0.15 on the 1 utility scale) further support the presence of this issue. The sample size of this study is relatively small compared to other mapping studies which may have impact the model estimations. It is also possible that a small number of non-ambulatory participants with extreme utility values (approximately 10% of participants) influence the narrow ranges and high MAEs, but removing these participants did not resolve the issue (Supplement 4). To gain better understanding of the predicted range, a larger sample size with wider spread of utility scores would be beneficial.

Model performance indicators of our best performing models were not as good as the previously published mapping study that estimates CHU9D utilities from CPQoL [10]. The published mapping using selected CPQoL Child domain scores and GLM with Gaussian family logit link had MAE = 0.062 and CCC = 0.745 [10], whereas our best performing regression using selected CPCHILD domain scores and GLM with Gamma family logit link had MAE = 0.152 and CCC = 0.552. The high MAEs and CCCs observed in this study are difficult to explain, given that the statistical mapping methodologies employed were similar in both studies. The sample size in this current study was twice that of the previous one. The different goodness-of-fit findings between the two mapping studies could have resulted from differences in population characteristics, particularly as the population in this study had more profound impairment. Differences in classification systems (CPQoL vs CPCHILD) could also be a factor. Alternatively, these results may have stemmed from the reliance of proxy-reported information. The use of proxy-report could potentially introduce bias, as it may not fully capture the child’s subjective experiences and preferences, leading to discrepancies in the estimated utilities [37]. However, the previous mapping study [10] also utilised proxy-reported information. In CP research, the proxy-reported approach remains invaluable and is often the only feasible method. This study did not assess the bias of proxy-reporting on both measures, nor did it evaluate whether parents/caregivers adopted different perspectives while completing the questionnaires. Therefore, the extent to which proxy-reporting could lead to biased estimates remains unknown.

Despite facing challenges in participant recruitment during the Covid-19 lockdowns, this study was able to recruit over 100 participants, which is a significant achievement considering the pragmatic difficulties in recruiting large numbers of this specific population. While the sample size is relatively small compared to other mapping studies, it is larger than the mapping study of the CPQoL onto the CHU9D. In addition, previous CP trials have experienced low participant recruitment [10, 38]. To further validate and confirm the predictive ability of the suggested models, future studies with larger sample sizes are recommended.

Conclusion

Twelve mapping models were developed to determine CHU9D utilities from either the CPCHILD total score or selected domain scores. Among these models, the regression model based on selected domain scores was recommended due to its superior performance indicators and predicted range, compared to the model based on CPCHILD total scores. The choice of model is contingent upon data availability. The use of alternative mapping algorithms should not lead to drastically different findings in economic evaluation, as their predictability is comparable. Future studies that employ our suggested models should investigate the impacts of utilising different algorithms. It should also be highlighted that mapping is considered as a second-best solution when direct MAUI data collection is not feasible.

To enhance the robustness of mapping CHU9D utilities from CPCHILD, future mapping studies should include a larger sample size, ensuring representation across the entire quality of life scale including those at the extreme ends of the scale, and incorporate an external dataset for validation.