A comparison of the Child Health Utility 9D and the Health Utilities Index for estimating health utilities in pediatric inflammatory bowel disease

Purpose Health utilities are challenging to ascertain in children and have not been studied in pediatric Crohn’s disease (CD) and ulcerative colitis (UC). The objective was to assess discriminative validity by comparing utilities elicited using the Child Health Utility-9 Dimension (CHU9D) to the Health Utilities Index (HUI) across multiple disease activity scales in pediatric UC and CD. Methods Preference-based instruments were administered to 188 children with CD and 83 children with UC aged 6 to 18 years. Utilities were calculated using CHU9D adult and youth tariffs, and HUI2 and HUI3 algorithms in children with inactive (quiescent) and active (mild, moderate, and severe) disease. Differences between instruments, tariff sets and disease activity categories and were tested statistically. Results In CD and UC, all instruments detected significantly higher utilities for inactive compared to active disease (p < 0.05). Mean utilities for quiescent disease ranged from 0.810 (SD 0.169) to 0.916 (SD 0.121) in CD and from 0.766 (SD 0.208) to 0.871 (SD 0.186) in UC across instruments. Active disease mean utilities ranged from 0.694 (SD 0.212) to 0.837 (SD 0.168) in CD and from 0.654 (SD 0.226) to 0.800 (SD 0.128) in UC. Conclusion CHU9D and HUI discriminated between levels of disease activity in CD and UC regardless of the clinical scale used, with the CHU9D youth tariff most often displaying the lowest utilities for worse health states. Distinct utilities for different IBD disease activity states can be used in health state transition models evaluating the cost-effectiveness of treatments for pediatric CD and UC. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-023-03409-x.


Plain English Summary
Inflammatory Bowel Disease (IBD) is an ongoing childhood condition that can be physically painful and greatly reduces the quality-of-life of affected children. It can be difficult to measure quality-of-life in children, particularly in very young children. Being able to measure quality-of-life for different levels of disease is important to understand the effectiveness of new treatments. Using a generic common measure of quality-of-life enables a comparison of health improvement across different patient populations. This is very useful for decision-makers considering value-formoney when allocating healthcare budgets. The aim was to compare different questionnaires to see which was best able to reflect changes in quality-of-life when IBD worsens in a group of 271 children with IBD. While all questionnaires were able to pick up quality-of-life differences when disease

Introduction
Ulcerative colitis (UC) and Crohn's disease (CD), collectively referred to as inflammatory bowel disease (IBD), are a class of chronic gastrointestinal diseases characterized by periods of unpredictable flares of inflammation of the gastrointestinal tract, abdominal pain, diarrhea, fatigue and weight loss, and periods of symptomatic remission [1]. Pediatric IBD is of particular concern because the incidence is increasing [2][3][4], growth may be affected [1,5] and it can have significant quality-of-life impacts [6][7][8][9]. With an increase in costly IBD treatments such as biologics, there is a need for economic evaluations to inform funding. However, measuring preference-based health-related quality-of-life (HRQOL) to generate health state utilities to calculate quality-adjusted life years (QALYs) for use in economic evaluation is challenging in children. In addition to challenges children experience in comprehending abstract concepts, HRQOL attributes featured in adult instruments may not be applicable to children. The validity of applying existing preference-based HRQOL instruments in children has been questioned [10][11][12][13]. Head-to head comparisons in pediatric patient populations are lacking [14,15]. Further, whether the set of utility weights that underly HRQOL classification systems should be derived from adults valuing pediatric states or directly from children continues to be debated [10,16].
The Health Utilities Index (HUI) has been used in children and adults [17] and generates utilities using the HUI Mark 2 (HUI2) or Mark 3 (HUI3). The HUI2 has 7 dimensions: Sensation, Mobility, Emotion, Cognition, Self-Care, Pain and Fertility, and the HUI3 has 8 dimensions: Vision, Hearing, Speech, Ambulation, Dexterity, Emotion, Cognition and Pain [17,18]. The underlying utility weights for HUI2 and HUI3 were developed with adults. While the HUI has been used in numerous patient populations and across several age groups [18], it had not been used in pediatric IBD [7]. The Child Health Utility 9D (CHU9D) was developed in 2009 specifically for children and with children [19,20]. It features a classification system of nine dimensions relevant to child health: Worried, Sad, Pain, Tired, Annoyed, Schoolwork, Sleep, Daily routine, and Activities [21]. Sets of utility weights obtained from Australian adults or adolescents are available [22,23].
An earlier study by our team found the CHU9D to be valid and reliable in pediatric CD and UC with moderate correlations observed between CHU9D, HUI2, and HUI3 utilities [24]. A Moderate to strong correlations between the CHU9D, HUI2, HUI3 and the disease-specific IMPACT-III or the generic PedsQL HRQOL measures were observed [25]. Multiple clinical measures may be used to determine disease activity in pediatric IBD. It's critical to examine and compare utilities associated with different measures to inform health state modeling used in cost-effectiveness analysis. The objective was to directly compare CHU9D using adult and youth tariffs to the HUI2 and HUI3 for determining health state utilities for distinct disease activity levels defined by different clinical scales in children with UC and CD. Such information is vital to assess the discriminant validity of preference-based HRQOL instruments and inform the choice of measures for use in children. This work also aims to derive utilities for health states for costeffectiveness analysis in pediatric IBD.

Methods
The study was approved by the Research Ethics Boards of the Hospital for Sick Children in Toronto, ON (#1000039604), the IWK Health Centre in Halifax, NS (#1015558), and the Janeway Children's Health and Rehabilitation Centre in St. John's NL (#13.229). The study was performed in accordance with the ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards. All parent participants provided informed consent and children provided informed assent.

Study design and participants
Participants were recruited from the Canadian Children Inflammatory Bowel Disease Network (CIDsCaNN) (https:// cidsc ann. ca), a repeated measures observational cohort study of Canadian children with newly diagnosed IBD [26]. Children were treated at the discretion of the attending clinicians. Children aged 6 to 18 years without co-morbidities were recruited between February 2014 and December 2018.

Measures
As indirect approaches to utility estimation, the HUI and CHU9D consist of a classification system and an underlying tariff set. The instrument attributes were derived from qualitative research to capture the construct of HRQOL [17,27]. Underlying tariff sets used to score the instrument are constructed by eliciting utilities for a wide range of health states from the public using standard gamble, time-tradeoff or a discrete choice approach [17,23,27,28]. A utility function is then derived to enable calculation of a summary utility score with interval scale properties for any given health state from 0 (death) to 1.0 (perfect health). Details are provided in Online Resource 1. The weighted Pediatric Crohn's Disease Activity Index (wPCDAI) [29,30] and the Pediatric Ulcerative Colitis Activity Index (PUCAI) [31][32][33] are widely accepted disease activity measures for children with CD and UC, respectively. The disease activity of each participant was categorized as none (quiescent), mild, moderate, or severe based on PUCAI and wPCDAI numerical score cut-offs [29,30,32,33]. For this paper, disease activity labeled as 'none,' 'remission' or 'quiescent' is referred to as quiescent. A Physician Global Assessment (PGA) of disease activity also categorized disease activity as quiescent, mild, moderate, severe, or fulminant. As a global measure, the PGA is based on the physician's determination of a patient's health, requires less data and relies less on objective measures than the wPCDAI and the PUCAI. Thus the PGA rating may be more readily attainable [30,31]. The wPCDAI and PUCAI have been correlated with the (PGA [30,31,34]. Disease activity was grouped into two categories: inactive (quiescent) and active (mild, moderate, severe or fulminant). This dichotomization recognizes that an important treatment difference lies between active disease, such as relapse with varying levels of disease activity, versus inactive disease, such as remission, where medications may be withdrawn. This facilitates the option of creating two distinct disease activity states with corresponding health utilities for use in health state transition modeling.

Data collection
The CHU9D and HUI were administered electronically in English via REDCap [35] at repeat assessments and were self-completed or self-assessed but interviewer-assisted, depending on the preference of the participant or caregiver. IBD-related disease activity was assessed at the time of CHU9D and HUI completion by study physicians using the PGA, and the wPCDAI [30] or PUCAI [33] for CD and UC patients, respectively.

Analysis
This study aimed to assess and compare the discriminant validity [36,37] of the CHU9D and HUI by examining the ability of each tool and tariff set to distinguish between levels of disease activity as defined by the PGA, the wPCDAI and PUCAI. For each participant, the first available pair of complete date-matched CHU9D and HUI questionnaires administered following disease diagnosis was analyzed. As no US or Canadian tariffs were available at the time of analysis, CHU9D utility weights were calculated using Australian adult tariffs and Australian adolescent tariffs with scoring algorithms provided in STATA by the developers [23,38]. HUI utility weights were calculated based on HUI2 and HUI3 algorithms provided by Health Utilities Inc. (http:// healt hutil ities. com/) under licence. The CHU9D and HUI algorithms were coded and all instruments scored using the R statistical software program (v. 4.0.0) [39]. The optional HUI2 "Fertility" attribute was omitted for this pediatric population.
Statistical analysis was conducted using R (v. 4.0.0) [39]. CD and UC patient data were analysed separately. Descriptive statistics were compiled using the Table One package in R [40]. Chi-square test, t test (for continuous variables), and Kruskal-Wallis rank-sum tests were used for demographic comparisons between sexes. The Shapiro-Wilk normality test was performed to examine the distribution of the CHU9D utilities calculated with adult and youth tariffs and for the HUI2 and HUI3 health utilities. The null hypothesis was rejected, and normality could not be assumed. Therefore, Kruskal-Wallis rank-sum tests were used to compare utilities between instruments. Medians and interquartile ranges (IQR) were calculated for all estimates in additions to means and standard deviations (SD). CHU9D (adult and youth tariffs), HUI2 and HUI3 mean and median utilities were determined by disease activity level based on wPC-DAI, PUCAI and PGA and for the dichotomized categories of active and inactive disease. Spearman correlations were determined between PGA and wPCDAI scores in CD subjects and between PGA and PUCAI scores in UC subjects. A Wilcoxon Rank Sum Test continuity correction with Bonferroni adjustment using the R 'stats' package [39] was conducted to compare CHU9D health utilities using adult and youth tariffs, HUI2 and HUI3 utilities between males and females, and to compare utilities between different disease activity levels as assessed by the PGA, wPCDAI and PUCAI. Box plots of mean health utilities for different activity levels were plotted using the R 'ggpubr' package [41].

Sample characteristics
A total of 312 children with CD who consented to CIDs-CaNN were eligible to participate in this sub-study. Of these, 116 (37.2%) were excluded due to the unavailability of date-matched CHU9D, HUI and clinical measures and 8 (< 1%) were excluded due to the use of proxy respondents, resulting in a CD sample of 271. A total of 138 CIDs-CaNN participants with UC were eligible. Of these, 53 (38.4%) were excluded due to the unavailability of datematched questionnaires and 2 (< 1%) were excluded due to the use of proxy respondents, resulting in a UC sample of 83. There were no statistically significant differences in demographic and health characteristics between males and females with CD and with UC (Table 1). At time of first CHU9D-HUI paired assessment, 39.9% of CD participants had quiescent (inactive) disease and 49.5% had mild, moderate, or severe (active) disease activity based on wPCDAI scores, and 55.3% had active disease based on PGA. In UC, 55.4% had inactive and 43.4% had active disease based on PUCAI scores, and 47.0% had active disease based on PGA. The Spearman correlation between the PGA and wPCDAI scores was 0.85 (p < 0.05) in CD and was 0.91 (p < 0.05) between the PGA and PUCAI scores in UC, indicating strong correlation between the clinical health assessment scales.

Overall utilities
As seen in Table 2, for the CD sample as a whole, mean/ median utilities in CD ranged from 0.757/0.792 to 0.873/0.926 across instruments. All instruments demonstrated lower utilities with increasingly active disease states. CHU9D youth tariff utilities consistently exhibited the lowest mean and median scores while the HUI2 almost always exhibited the highest overall mean and median scores. For UC as a whole, mean/median utilities ranged from 0.719/0.737 to 0.833/0.888. All instruments generally demonstrated lower utilities with increasingly active disease except where small sample sizes resulted in unstable estimates. CHU9D utilities calculated with youth tariffs were consistently lower than those calculated with adult tariffs.  CHU9D youth tariff utilities most often exhibited the lowest and the adult tariff the highest mean and median scores across instruments and activity levels. Mean/median utilities were significantly lower in UC compared to CD for each instrument (p < 0.05). The observed differences between CHU9D adult and youth tariffs, HUI2 and HUI3 utilities for inactive versus active disease were statistically significant in CD (Fig. 1) and UC (Fig. 2).

CD
In CD, mean/median utilities ranged from 0.810/0.821 to 0.916/0.947 for quiescent disease and from 0.562/0.598 to 0.794/0.819 for severe activity across instruments and scales ( Table 2). When disease activity was defined by the PGA, quiescent disease utilities were significantly greater than mild activity for HUI3 alone (p < 0.05). Quiescent disease utilities were significantly greater than moderate activity utilities with the CHU9D adult and youth tariffs (p < 0.05) but not with HUI2 or HUI3. Quiescent disease utilities were significantly greater than severe activity utilities for CHU9D (youth tariff) and for HUI2 and HUI3 (p < 0.05). When disease activity was based on the wPCDAI, quiescent or mild activity utilities were significantly greater than severe utilities for all tariff sets and instruments (p < 0.05). There were no significant differences between mild and moderate disease activity utilities. The means and medians of utilities from CHU9D adult and youth tariffs, HUI2 and HUI3 for inactive and active disease states in CD and UC are presented in Table 3. Mean/ median utilities for active CD ranged between 0.694/0.720 and 0.837/0.896 across all utility instruments. Significantly greater utilities were observed for inactive compared to active states for all severity scales and utility instruments (p < 0.05). No statistically significant differences were observed in utilities between PGA and wPCDAI definitions of active disease; similar utilities for PGA and wPCDAI indicates overlap across clinical definitions and substantiates the correlation between scales.

UC
In UC, mean/median quiescent utilities ranged from 0.766/0.803 to 0.871/0.936 and from 0.552/0.582 to 0.802/0.805 for severe activity across all instruments and scales ( Table 2). The smaller UC sample size was associated with wider ranges of utilities within activity levels across instruments. No significant differences in utilities were observed between quiescent, mild, or moderate activity compared to severe activity for CHU9D adult and youth adult tariffs, B CHU9D youth tariffs, C HUI2, and D HUI3. There was a significant difference between utilities in active and inactive disease (p < 0.05) tariffs with the PGA and PUCAI. For the PGA, quiescent disease utilities were significantly greater than mild activity for HUI3 (p < 0.05) and were significantly greater than moderate activity for HUI2 and HUI3 (p < 0.05). There were no significant differences in utilities between PUCAI activity levels for HUI2, but quiescent disease utilities were significantly greater than severe activity utilities for the HUI3 (p < 0.05). Very small sample sizes (≤ 10) in the moderate and severe groups resulted in unstable estimates.
Utilities for inactive UC were significantly greater than active disease utilities across all instruments and scales (p < 0.05) ( Table 3). Mean/median utilities for active disease ranged from 0.654/0.664 to 0.800/0.816. As in CD, comparable utilities for PGA and PUCAI disease activity levels for each health utility instrument reflect the correlation between scales.
No statistically significant differences between utilities for males and females derived from CHU9D adult tariffs, CHU9D youth tariffs, HUI2 or HUI3 were observed in CD (Table 4). In UC, CHU9D utilities calculated with adult and with youth tariffs were significantly higher for males compared to females (p < 0.02). Differences in utilities between males and females with UC were not statistically significant for HUI2 or HUI3.

Discussion
Despite being generic, all HRQOL instruments and tariff sets discriminated moderately well between quiescent, mild, moderate and severe disease activity in CD and UC. Compared to the adult tariff set, CHU9D youth tariff utilities were consistently lower and displayed a greater range across disease activity levels for CD and UC. Similarly, the HUI3 displayed lower utilities and a wider range across disease activity levels in CD and UC compared to the HUI2.
While specific utilities for pediatric CD and UC are lacking, a meta-analysis reported mean utilities of 0.860 with the HUI2 and 0.882 with the HUI3 for digestive system disorders other than IBD, such as liver diseases, gastric ulcer, and other disorders [42]. Combined pediatric chronic diseases had mean utilities of 0.924 using a standard gamble (SG), 0.884 using the HUI2 and 0.834 with the HUI3 [42]. A meta-analysis of eleven adult CD studies determined a mean utility of 0.8403, 95% CI (0.8012, 0.8794) for remission, Fig. 2 Boxplots comparing utilities in ulcerative colitis patients in inactive and active disease categories as determined by the Pediatric Ulcerative Colitis Activity Index (PUCAI) for: A CHU9D adult tariffs, B CHU9D youth tariffs, C HUI2, and D HUI3. There was a significant difference between utilities in active and inactive disease (p < 0.05) 0.7533, 95% CI (0.6887, 0.8178) for active disease, 0.8619, 95% CI (0.8016, 0.9223) for mild disease, 0.7318, 95% CI (0.6271 0.8364) for moderate disease and 0.5102, 95% CI (0.3554, 0.6650) for severe disease [43], values comparable to utilities observed in the present analysis. The results suggest that similar to adults, children with IBD can experience a range of utilities across clinically meaningful disease activity levels. A meta-analysis of 15 adult UC studies demonstrated a mean utility of 0.8726, 95% CI (0.8457, 0.8995) for remission, 0.6992, 95% CI (0.5847, 0.8136) for active disease, 0.7834 95% CI (0.7265 0.8403) for mild disease, 0.6969 95% CI (0.3959 0.9978) for moderate disease, and 0.7059 95% CI (0.5065 0.9054) for severe disease [43]. The present study demonstrated comparable results to adults for remission, but lower utilities for more active disease, however active disease utilities fell within the 95% confidence interval reported for adults.
Assigning utilities to health states can be complicated by the existence of multiple clinical measures used to describe disease activity. The wPCDAI has been found to perform better in measuring CD disease activity than other versions of the PCDAI and the Harvey-Bradshaw Index (HBI), developed for use in adult patients [30,44]. In the present study, correlations between the PGA and clinical measures of disease activity for CD and UC exceeded 0.85, and the ability of the preference-based instruments to distinguish between PGA-defined disease activity levels suggest that when laboratory and other objective clinical data are unavailable, PGA utilities can be used in economic modeling. Dichotomizing disease activity as active or inactive may further facilitate modeling when a variety of scales are used or when data needed for finer disease activity stratification are missing. It's important to note that multi-item measures of current disease activity may not incorporate how a patient's prior disease experience or duration of disease influences their health state preferences. For example, long-term healing and achieving stable disease may reduce anxiety, fatigue, and pain which may be observed as improvement in HRQOL over time [45,46].
A common challenge is that while all instruments aim to capture the construct of HRQOL, they may return different utilities for the same health state. These discrepancies may be due to the different domains, classification systems and underlying weights of each instrument. This raises the issue of comparability and interchangeability of pediatric health utility instruments with each other and with adult instruments. It must also be acknowledged that HRQOL in children differs from adults, and also differs within pediatric age groups from neonate to adolescent [47][48][49][50]. While the HUI2 and HUI3 have long been used in pediatric as well as adult populations, the CHU9D was designed exclusively as a tool for children aged 7 to 17 years [28,51,52]. Our previous research reported correlations of 0.62-0.69 between Table 3 Comparison of CHU9D, HUI2 and HUI3 utilities in CD and UC patients by inactive and active disease  Table 4 Comparison of CHU9D, HUI2 and HUI3 utilities in CD and UC patients stratified by sex and disease activity level the CHU9D and HUI2 and HUI3, with slightly higher correlations for the youth compared to the adult tariff set [24]. The agreement between CHU9D and HUI2 or HUI3 was greater at higher utilities [24]. That research indicated that the CHU9D Sleep domain, a domain not present in the HUI2 or HUI3, had the lowest domain score when youth tariffs were used but Pain was scored lowest with the adult tariff set. The Pain domain score also ranked lowest for the HUI2 and HUI3. The difference between CHU9D youth and adult tariffs with regard to which domain ranked lowest suggests that youth and adults may place more weight on different attributes [38,53]. Not surprisingly, the highest correlations between the CHU9D and HUI2 or HUI3 were observed for Pain [24]. Pediatric IBD patients can experience pain and this common domain may have contributed to the ability of CHU9D as well HUI2 and HUI3 to distinguish between levels of disease activity in the present study. This may not be the case for other pediatric conditions. The present study demonstrated that compared to the HUI2, the HUI3 generally demonstrated lower and a wider range of mean and median utilities across activity levels, possibly due to different domains, which may account for its superior ability to distinguish between quiescent and mild disease activity levels. Although differences in utilities between instruments were mostly small, they could impact QALY calculations. A probabilistic analysis that integrates ranges of utilities for a given heath states corresponding to the values observed in this study is recommended. Although the CHU9D was initially developed with utility weights obtained from adults, an adolescent tariff set was created from a sample of Australian adolescents using a best-worst scaling method [23]. Ratcliffe et al. found that adults placed less weight on mental health impairments (worried, sad, annoyed) and more weight on moderate to severe levels of pain compared to adolescents. Whereas youth tariffs may be more reflective of the adolescent experience and childhood in general compared to adult tariffs, IBD and other chronic pediatric conditions have ages of onset less than ten years of age. Choosing a valuation method that can be used in children to generate underlying weights remains a challenge [14]. At present, guidelines for economic evaluation prefer utilities elicited using a SG or TTO approach [54,55]. As pressure increases to conduct cost-utility analysis to inform budget allocation for drugs and technologies for children, guidelines will need to reflect newer methods and approaches for eliciting and deriving utilities in children.
The present results can be used to populate health state transition models for cost-utility analysis comparing pediatric IBD treatments, including biologics and less costly biosimilars. Health state transition (Markov) models incorporate the probability of transitioning between health states characterized by different levels of disease activity or severity [56,57]. The effectiveness of different treatments can be compared by capturing how much time is spent in a given health state (e.g., inactive versus active) over the designated study time horizon. For example, from the present analysis, a CHU9D youth utility of 0.81 (SD 0.17) can be assigned to a disease remission (inactive) health state and 0.69 (SD 0.21) would be assigned to a relapse (active) state in CD [58]. More effective treatments will result in children spending more time in the higher utility inactive state. The better that active disease states can be distinguished from inactive states in terms of utility, the more sensitive an analysis will be to true differences in treatment effectiveness. Thus preference-based HRQOL measures that are better at discriminating between active and inactive disease will be stronger options for conducting cost-effectiveness analysis compared to instruments that are less discriminant.
Including multiple clinical disease activity scales and categorizations, preference-based instruments, and tariff sets in the present study provide researchers with the opportunity to choose utilities that align best with their target study population, preferred health economic protocols, and clinical practice. Further, the present study indicates that in the absence of detailed objective clinical data for the complete scoring of the wPCDAI or PUCAI, a PGA may provide reliable utility estimates across disease activity levels. The similarities in utilities within HRQOL instruments when disease activity is defined by the PGA or wPCDAI in CD and by the PGA or PUCAI in UC suggest that QALY calculations will not be affected and economic evaluations conducted using utilities associated with disease activity described by any of these clinical scales would be comparable. Despite uncertainty arising from variation across instruments and tariff sets, establishing genuine pediatric health utilities in CD and UC addresses an important gap. While a strength was comparing different pediatric utility instruments within the same groups of patients, the study did not include alternative approaches such as TTO, visual analog scales, or discrete choice options to generate utilities. These approaches can be challenging to administer in children. Another strength was the comparison of utilities between CD and UC populations at the same clinical sites, thereby controlling for variations in practice patterns or standards of care. However, enrollment was limited to three Canadian centres which may limit generalizability. The need to examine sub-groups according to ethnic diversity and gender is growing in importance. The present analysis found that utilities were significantly lower in females with UC as measured by the CHU9D. Such a finding has not been reported previously in IBD except in adult CD patients with utility assessed using the SF-6D [59]. An important limitation was the small samples in some of the disease activity levels. A large proportion of participants in this observational cohort were well managed and in a quiescent state, and few had severe disease activity. This hampered the comparison of utilities between quiescent, mild, moderate and severe activity states. However, when dichotomized into active and inactive disease, there was sufficient statistical power to detect a significant difference in utilities in CD and in UC with all instruments. Another limitation was the skewness of the data. It's important to consider median utilities and IQRs alongside means and standard deviations to interpret the findings.
In conclusion, the CHU9D calculated with adult and youth tariffs, the HUI2 and the HUI3 discriminated between levels of disease activity experienced by children with CD and UC from quiescent to severe disease activity. These utilities may be valuable for health state transition models assessing the cost-effectiveness of emerging treatments in pediatric IBD.