FormalPara Key Points for Decision Makers

Overall the Pediatric Quality of Life Inventory™ v4.0 Generic Core Scales (PedsQL GCS) and Child Health Utility 9D (CHU9D) instruments both demonstrated consistent psychometric robustness against a set of absolute criteria for the psychometric properties assessed in this study; however, the CHU9D was more variable in its performance with some of the psychometric properties such as acceptability and known group validity.

This study expands the evidence base of psychometric performance of the PedsQL GCS and CHU9D in a large representative cohort of Australian children and adolescents with common chronic health conditions.

Results from this study can aid in appropriate health-related quality-of-life instrument selection by researchers and clinicians, to ensure that clinical and policy decision-making outcomes are based on consistent and robust research and evaluations.

1 Introduction

Patient-reported outcome measures (PROMs) have become increasingly important in health outcomes research in children and young people, including in clinical trials, evaluations of health systems, and informing decision making [1, 2]. Generic PROMs include measures that aim to assess a broad concept of health-related quality of life (HRQOL) that is relevant to all populations and therefore enable comparisons across clinical conditions and between clinical and general populations [3, 4]. Some generic measures of HRQOL measure preference-based HRQOL on a utility scale (e.g. multi-attribute instruments) and are used to calculate quality-adjusted life years (QALYs) in health economic evaluations to inform decision making within the context of healthcare funding constraints [5]. Other non-preference-based PROMs are valuable as broad measures of health in clinical trials and population health monitoring. All PROMs must meet minimum standards for scientific quality. These standards are well established and are developed and used across research [6,7,8], professional bodies [9], and clinical trials and industry regulators [10].

A recent systematic review identified 89 generic PROMs for use in children and young people (aged ≤ 18 years old), including preference-based HRQOL instruments [1]. The Pediatric Quality of Life Inventory™ v4.0 Generic Core Scales (PedsQL GCS), a non-preference-based HRQOL instrument, is one of the most widely used generic PROMs for HRQOL assessment [11]. Using an integrated generic core and disease/symptom-specific modular approach, the conceptual framework for the Pediatric Quality of Life Inventory™ (PedsQL) was derived from an earlier HRQOL measure that was developed using a cancer patient cohort aged 8–18 years old [12, 13]. The PedsQL model includes several condition-specific modules (e.g. asthma, rheumatology, diabetes) and the PedsQL 4.0 Generic Core Scales (PedsQL GCS). The PedsQL GCS was developed as a generic measure and validated in a cohort that included healthy children and children with chronic conditions, including asthma, attention deficit hyperactivity disorder (ADHD), and diabetes. [14, 15]. The Child Health Utility 9D (CHU9D) is a relatively recent preference-based HRQOL instrument that was developed specifically with and for children and includes value sets for different country contexts derived from adolescent populations to provide health utilities [16,17,18,19]. The Longitudinal Study of Australian Children (LSAC) administered both the PedsQL GCS and the CHU9D instruments to their participants and provided this study with the opportunity to evaluate the psychometric properties of these two instruments within the same sample of Australian children and adolescents.

Assessment of psychometric properties of HRQOL instruments is important to determine whether the instrument provides valid, reliable, and responsive measurement of the concept being assessed, i.e. HRQOL. However, many lack evidence of performance over a comprehensive range of psychometric properties [20, 21]. This lack of evidence may hinder appropriate instrument selection for the required context by researchers and clinicians, and raises potential issues such as the validity and reliability of outcome measurements and consequent clinical and policy decision making. Two systematic reviews identified evidence gaps in the psychometric performance of HRQOL instruments for children and young people, including the PedsQL and CHU9D [4, 22]. Moreover, Rowen and colleagues highlighted that good performance of preference-based instruments in a general population does not necessarily signal good performance in patient populations with specific clinical conditions [22]. The psychometric performance evidence base is more extensive for the PedsQL but relatively limited for the CHU9D [4, 22]. There is very little research that assesses the psychometric properties of the PedsQL and CHU9D in the same cohort, especially patient cohorts, to evaluate the performance of these instruments against a set of absolute criteria using the same dataset. Some studies have assessed the psychometric properties of the PedsQL GCS and validated translations of the CHU9D using the same study sample, namely in Denmark, China, and Sweden [23,24,25]. The paucity of evidence around the psychometric performance of the PedsQL GCS and CHU9D in Australian cohorts is also evident, with only two studies that have examined both instruments in Australian adolescent populations [26, 27]. However, both studies used cross-sectional data and assessed psychometric properties in general populations of adolescents aged 11–12 years and 15–17 years, respectively. The first study [26] only indirectly assessed psychometric properties of the PedsQL GCS and CHU9D, while the second study [27] used the Short Form 15 version of the PedsQL GCS, rather than the full instrument.

Given the gap in the evidence base for the psychometric performance of the generic PedsQL GCS and CHU9D instruments in patient populations of Australian children, the aim of this study was to assess the acceptability, reliability, validity, and responsiveness of the PedsQL GCS and the CHU9D for the measurement of HRQOL among children with common chronic conditions, using a longitudinal dataset.

2 Methods

2.1 Participants

This study was a secondary analysis of existing data from the LSAC [28]. The LSAC is a continuing population representative survey of children and their families that collects data on child wellbeing and development over the paediatric life course of the child, using separate parent-reported and child-reported questionnaires. The LSAC data collection phases (referred to as waves) started with the initial recruitment of two cohorts of children during 2004 (wave 1). Children and their parents have been interviewed every 2 years using a mixture of data collection methods and modes, i.e. self-complete, interviewer-administered, mail-out, in-person, telephone, and computer-assisted methods [28]. The most recent wave of data collection was in 2020 (wave 9). Wave 1 included two cohorts of children with 5107 in the ‘baby’ (B) cohort aged 0–1 years old and 4983 children in the ‘kindergarten’ (K) cohort aged 4–5 years old at baseline [29]. In this study, we used a subset of the LSAC that included data collected using both the PedsQL GCS (parent proxy-reported questionnaire) and CHU9D (child self-reported questionnaire) instruments from the B and K cohorts. The resulting longitudinal dataset included data from 2013 to 2018, and included the B cohort in which children were aged 10–11 (wave 6; 2013–2014), 12–13 (wave 7; 2015–2016), and 14–15 years old (wave 8; 2017–2018), respectively, and the K cohort in which children were aged 14–15 (wave 6; 2013–2014) and 16–17 years old (wave 7; 2015–2016).

2.2 HRQOL Instruments

2.2.1 PedsQL GCS

The PedsQL GCS is a non-preference-based generic PROM that was developed to measure childhood HRQOL [30]. Parent proxy-reported age-appropriate PedsQL GCS versions were used in the LSAC. For children aged 10–13 years old, the LSAC used the PedsQL GCS parent report for children (ages 8–12) version for all the relevant waves in the B and K cohorts, except for children aged 10–11 years (wave 4) in the K cohort, which used the PedsQL GCS parent report for young children (ages 5–7) version. For children aged 14–17 years, the PedsQL GCS parent report for teenagers (ages 13–18) version was used for all the relevant waves in both the B and K cohorts. The age-specific PedsQL GCS versions used in this study have the same scoring structure; however, some questions are worded differently to be age appropriate. The PedsQL GCS versions used in the study consist of 23 items within four summary functional subscales/domains, namely, physical (8 items), emotional (5 items), social (5 items), and school (5 items) [30]. Each item is scored on a 5-point scale (0 = never a problem; 1 = almost never a problem; 2 = sometimes a problem; 3 = often a problem; 4 = almost always a problem). To score the PedsQL GCS, the items scores are reversed scored (0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0) and transformed onto a linear scale where the total score scale is the sum of items scores divided by the number of items answered, and ranges from 0 to 100, with higher scores indicating better HRQOL [30].

2.2.2 CHU9D

The CHU9D is a generic childhood preference-based HRQOL instrument designed to measure health utility. It was developed with children from the United Kingdom and initially validated for a target age of 7–11 years old in a general population sample and a clinical sample [17, 18]. The clinical sample included a wide range and severity of health conditions, with children recruited from a medical ward that covered acute and chronic medical conditions, surgical wards that covered a range of renal, gastrointestinal, neurological, orthopaedic, limb reconstruction, and spinal surgeries, and a day care unit that covered children undergoing procedures involving urology, gastroenterology, endocrinology, neurology, oncology, orthopaedics, general surgery, dental, ear, nose and throat, as well as allergy patients [17, 18]. It has also been validated in Australia in a sample of adolescents aged 11–17 years old from the general population [19, 31]. The CHU9D is self-reported by the child in the LSAC dataset and comprises nine dimensions (worried, sad, pain, tired, annoyed, schoolwork, sleep, daily routine, and activities), each of which are scored across five levels (0 = "It is never a problem"; 1 = "It is almost never a problem"; 2 = "It is sometimes a problem"; 3 = "It is often a problem"; 4 = "It is almost always a problem") [18]. A preference-based value set is applied to the participant responses to provide an overall health utility score indexed from 0 (usually indicating dead) to 1 (perfect health) and can include negative values that indicate health states considered worse than being dead. Value sets for the CHU9D have been developed for the United Kingdom, Australia, China, and the Netherlands [1]. In the LSAC dataset, the overall CHU9D utility scores were calculated using the Australian valuation algorithm developed in adolescents [32], which results in a utility scale that ranges from −0.1059 (poorest health) to 1 (perfect health).

2.3 Health Conditions

At each wave of data collection in the LSAC, the parent proxy reported the presence or absence of a range of health conditions for their child. While the LSAC includes children with various health conditions, this study focussed only on the children with parent proxy-reported presence of any of the six common chronic health conditions (asthma, anxiety/depression, ADHD, autism/Asperger’s, epilepsy, type 1 diabetes) and children without any of these six conditions. Children may have had multiple health conditions in LSAC; however, this study identified the presence of any of these six specified health conditions as an individual observation when analysing by each condition and did not account for children with multiple conditions. An end user group (EUG) for this study, which consisted of paediatric and non-paediatric clinicians, health economists, and decision makers (14 members in total), stated and ranked paediatric clinical health conditions in order of priority. The choice of these final six common chronic health conditions was based on a combination of the prioritisation feedback from the study EUG and the availability of data for the health conditions within the LSAC.

2.4 Sociodemographic Characteristics

Where appropriate, the statistical analyses controlled for sociodemographic characteristics of the participating children that are known predictors of childhood HRQOL, such as age [33, 34], gender [34], culturally and linguistically diverse (CALD) status [35], being Aboriginal or Torres Strait Islander [36], and socioeconomic position (SEP) [37, 38]. The sociodemographic characteristics were parent proxy-reported and coded in the analyses as follows: age (in years), sex (male/female), SEP (high/low), CALD status (CALD/not CALD), and Aboriginal or Torres Strait Islander (yes/no). The LSAC study measures SEP using a composite variable developed for the study, which combines education level, occupation type, and income of the child’s parents into a z-score [39]. The present analysis further categorised the SEP variable into low SEP [yes = SEP z-score < 0 (low SEP); no = SEP z-score ≥ 0 (high SEP)]. The variable that documented whether a language other than English was regularly spoken to the child at home was collected at age 2–3 years for the B cohort and age 4–5 years for the K cohort and was used as a proxy measure for CALD status in this study.

2.5 Evaluation of Psychometric Properties

For each instrument, we evaluated acceptability, reliability, validity, and responsiveness and compared them against criteria from psychometric best-practice guidelines and established standards [6,7,8,9,10, 40,41,42]. Except when assessing acceptability through the level of missing data, the psychometric evaluation for each health condition used complete observations for the specified condition, PedsQL GCS, and CHU9D. Acceptability, reliability, and convergent validity analyses were conducted separately for each of the six health conditions and also for children without any of the six conditions. Known group validity and responsiveness were assessed separately for children with each of the six health conditions.

Acceptability typically measures the quality of the data and is assessed by data completeness and distribution of scores, including floor and ceiling effects [10, 41]. Acceptability can also refer to the feasibility and practicality of using a particular instrument and may include measures of comprehension or burden of completion [10, 40]; however, these data were not available within the LSAC. Hence, acceptability was limited to assessment of missing data (missingness) and the proportion of ceiling and floor values for the PedsQL GCS total scores and CHU9D utility scores. Missing data levels of < 5% and proportions of floor and ceiling values < 10% are considered relevant thresholds for acceptability [41].

Reliability relates to the degree to which an instrument measurement is free from random error [7, 9]. The cohort dataset only enabled assessment of reliability in terms of internal consistency, which measures the inter-relatedness among items from the same scale [7, 42]. We assessed internal consistency for the four subscales of the PedsQL GCS, the total PedsQL GCS, and the total CHU9D scales using Cronbach’s alpha and item-total correlations. Cronbach’s alpha values ≥ 0.7 and item-total correlations ≥ 0.2 are considered minimum standards for internal reliability consistency [41,42,43,44].

Validity assesses whether the instrument measures the concept that it is intended to measure [7, 42]. This study assessed convergent validity and known group validity, both of which provide evidence of construct validity [7]. Convergent validity assesses the extent to which there is an association between the scale under investigation and other scales that measure the same or similar constructs [10, 40]. The PedsQL GCS and CHU9D are both established and accepted measures of the HRQOL construct, and as there was no other independent validated HRQOL instrument within the LSAC dataset, convergent validity was assessed using these instruments. This is not ideal as both instruments were under scrutiny in our study. However, given that this was a secondary analysis, and we were limited to the data collection undertaken by the LSAC, this test was better than having no test at all for convergent validity. Convergent validity was assessed using Spearman’s correlations between the PedsQL GCS total score and the CHU9D utility score for the children and adolescents in the study sample. Convergent validity was determined by the expected correlation and was categorised based on strength of the correlation coefficients, i.e. weak (< 0.40), moderate (0.41–0.60), good (0.61–0.80), and strong (> 0.80) [22]. Although both the PedsQL GCS and CHU9D measure HRQOL, the LSAC used only parent proxy-reported versions of the PedsQL GCS, while the CHU9D was child self-reported; hence, we hypothesised that there may be only moderate correlations between the two instruments.

Known group validity assesses if the instrument can differentiate between clinically distinct groups [10]. We hypothesised that children with each of the six selected clinical conditions individually would have lower HRQOL compared to those without that condition. Known group validity was evaluated using general estimating equations (GEE) to account for the repeated measures of HRQOL and reported clinical condition among the same children, with adjustment for sociodemographic characteristics known to impact on HRQOL [45]. The PedsQL GCS total score, which is scored on a 0–100 scale, was transformed to a 0–1 scale so that the PedsQL GCS total score and CHU9D utility score variables within the GEE models used the same scale. The GEE models specified a binomial family with a log-link function to account for the distribution of the HRQOL scores being on a 0–1 scale in this study, and robust variance estimation. Separate GEE models were estimated for each of the six health conditions (asthma, anxiety /depression, ADHD, autism/Asperger’s, epilepsy, type 1 diabetes) for both the PedsQL GCS and CHU9D, where the effect of the presence of each selected condition was relative to the absence of that selected condition only (e.g. asthma vs no asthma only, ADHD vs no ADHD only); hence, the reference level sample differs for each of the individual conditions because health conditions other than the condition of interest may have been present. This resulted in six separate GEE models using the PedsQL GCS total scale score (transformed to a 0–1 scale) as the response variable and the presence of each of the six selected conditions individually as the univariate explanatory variable, adjusted for the sociodemographic characteristics (i.e. age, sex, whether participants identified as Aboriginal or Torres Strait Islander, CALD status, and SEP), and a further six GEE models were specified in the same way but with the CHU9D utility score as the response variable. Significance levels were set at p < 0.05 for main effects and p < 0.01 for interaction terms. Interaction terms of clinical condition and significant sociodemographic variables were explored; however, none were significant and only the main effects GEE models are reported. The main effects GEE models were used to predict the marginal effects of the clinical condition on HRQOL, i.e. the adjusted predicted mean PedsQL GCS total score transformed back to a 0–100 scale and mean CHU9D utility scores in the presence and absence of the individual clinical condition.

Responsiveness assesses the ability of the instrument to detect change over time in the construct being measured [7] in relation to an intervention of known efficacy [22, 41]. We hypothesised a change in HRQOL (i.e. a change in the PedsQL GCS total score or CHU9D utility score) if there was a change in the status of the selected health conditions between consecutive waves of the LSAC data. Children were classified as to whether each of the selected clinical conditions persisted, resolved, or manifested between consecutive waves of the LSAC dataset. As autism/Asperger’s, epilepsy, and type 1 diabetes are lifelong conditions [46,47,48], they were not categorised as resolved as this was considered not to be clinically possible. The B and K cohorts provided data on the change in HRQOL total scores for individual children over 2-year intervals for the following three age group progressions: 10–11 to 12–13 years old, 12–13 to 14–15 years old, and 14–15 to 16–17 years old. We hypothesised that a child moving from absence to presence of the condition (e.g. no asthma to asthma) would result in a negative HRQOL score change, whilst moving from presence to absence of the selected conditions (e.g. asthma to no asthma) would result in a positive HRQOL score change, and no change in presence or absence of the condition would result in an HRQOL score change close to zero. Responsiveness was evaluated using effect sizes (ES), which account for the change in HRQOL score in relation to the standard deviation (SD) of the baseline score, and were calculated according to the method outlined by Pink and colleagues [49]. We used the commonly accepted, although arbitrary, categorisations of magnitude of ES being small (ES = 0.2), moderate (ES = 0.5), and large (ES ≥ 0.8) [44, 50].

3 Results

3.1 Participants

Descriptive statistics for the study analysis sample are shown in Table 1. Across both cohorts and all age groups, data consisted of 15,568 observations from 7201 children with 9529 observations from 3760 children in the B cohort (‘baby’ cohort) and 6039 observations from 3441 children in the K cohort (‘kindergarten’ cohort).

Table 1 Descriptive statistics for study analysis sample of children and adolescents by health condition and age group

There were low numbers of children reported with epilepsy and diabetes (type 1) over all waves of data collection for the B and K cohorts. The distribution of sociodemographic characteristics varied across the six conditions, with a higher proportion of boys having ADHD or autism/Asperger’s compared with the other health conditions and the sample without any of these conditions.

3.2 Psychometric Properties

3.2.1 Acceptability

Acceptability of the PedsQL GCS was high, based on the overall low level of missingness (< 2%) for the total scores and no floor or ceiling effects across all samples of children with each of the six conditions and without any of the conditions, over all the age groups (see Supplementary Table S1 in the Electronic Supplementary Material [ESM]). Acceptability for the CHU9D was mixed. While missingness was < 5% for CHU9D utility scores for most age groups for asthma, type 1 diabetes, and children without any of the six conditions, missingness was > 5% for most age groups in children with epilepsy, ADHD, autism/Asperger’s, and anxiety/depression. There were no floor effects for the CHU9D utility total score. However, ceiling effects > 10% were observed for most of the age groups for children with each of the six conditions and without any of those conditions.

3.2.2 Internal Consistency Reliability

Among children and adolescents with each of the six conditions individually and without any of the six conditions, internal consistency reliability for the PedsQL GCS total score scale and the individual summary score subscales for physical health, emotional, social, and school functioning was strong (Cronbach’s alpha range 0.70–0.95; item-total correlations range 0.35–0.84) (Table 2). CHU9D utility scores also showed strong internal consistency reliability for children and adolescents without any of the six conditions individually and over all the conditions (Cronbach’s alpha range 0.76–0.84; item-total correlations range 0.32–0.70), except for type 1 diabetes (Cronbach’s alpha = 0.65; item-total correlations range 0.09–0.56).

Table 2 Internal consistency results for PedsQL GCS and CHU9D among children aged 10–17 years old by clinical condition

3.2.3 Convergent Validity

Convergent validity between the PedsQL GCS and CHU9D among children with each of the six conditions individually and without any of the six conditions was weak overall (i.e. < 0.4 threshold), and the statistically significant Spearman’s correlation coefficients ranged from 0.13 to 0.30 (Table 3). Children aged 14–15 years old (K cohort) with type 1 diabetes were the exception (Spearman’s correlation coefficient = 0.62); however, this was a small sample (n = 9).

Table 3 Convergent validity (i.e. Spearman’s correlation coefficients) between PedsQL GCS total score (parent-reported) and CHU9D utility score (child-reported) for children aged 10–17 years old by clinical condition status and age group

3.2.4 Known Group Validity

The outputs from the main effects models using GEE for quality of life associated with the six clinical conditions are presented in Supplementary Table S4 in the ESM. These univariate analyses of the health conditions, adjusted for the selected sociodemographic variables, indicated that the presence of asthma, anxiety/depression, ADHD, and autism/Asperger’s coefficients were statistically significant, but epilepsy and type 1 diabetes were not.

Marginal predictions for the mean PedsQL GCS total score from the GEE main effects models adjusted for sex, age, identifying as Aboriginal or Torres Strait Islander, CALD status, and SEP (Fig. 1 and Supplementary Table S2 in the ESM), indicated that the PedsQL GCS discriminated (p < 0.05) between children with and without anxiety/depression, ADHD, autism/Asperger’s, and epilepsy, but not for asthma and type 1 diabetes.

Figure 1
figure 1

Known group marginal predictions for PedsQL GCS total score and CHU9D utility score from the 12 separate adjusted GEE models with 95% CIs for children aged 10–17 years old with and without the individual clinical conditions. Due to the large sample sizes, the 95% CIs are very precise and therefore not apparent in the figure. ADHD attention deficit hyperactivity disorder, CHU9D Child Health Utility 9D, CI confidence interval, GEE general estimating equation, PedsQL GCS Pediatric Quality of Life Inventory™ v4.0 Generic Core Scales

Marginal predictions for the mean CHU9D utility score from the GEE main effects models adjusted for sex, age, identifying as Aboriginal or Torres Strait Islander, CALD status, and SEP (Fig. 1 and Supplementary Table S3 in the ESM) indicated mixed known group validity for the CHU9D. The CHU9D only discriminated between children with and without anxiety/depression, ADHD, and autism/Asperger’s, but not for those with asthma, epilepsy, and type 1 diabetes. Note that the marginal predictions for the mean PedsQL GCS total score and mean CHU9D utility score varied for the reference level (i.e. without the condition) for each of the individual health conditions (Fig. 1 and Supplementary Table S3). This is because the effect of the presence of the selected condition in each GEE model was relative to the absence of that selected condition only; hence, the reference level sample differed for each of the individual conditions. Figure 1 does include the 95% confidence intervals (CIs) for the marginal prediction estimates; however, due to the large sample sizes, the 95% CIs are very precise and therefore not apparent in the figure.

3.2.5 Responsiveness

In Figs. 2 and 3 respectively, the ES estimates would be expected to trend from left to right for a responsive instrument, with a negative ES expected for the “worse” category, an ES close to zero for “same”, and a positive ES for “better”. The responsiveness of both PedsQL GCS and CHU9D was variable for the six conditions across the available age groups in the B and K cohorts (Figures 2 and 3).

Fig. 2
figure 2

Responsiveness results for PedsQL GCS and CHU9D, i.e. effect sizes and 95% CIs for changes in condition status for asthma, anxiety/depression, and ADHD over the 3 age group progressions. The effect size point estimates would be expected to trend from left to right for a responsive instrument, with a negative effect size expected for the “worse” category, an effect size close to zero for “same”, and a positive effect size for “better”. ADHD attention deficit hyperactivity disorder, CHU9D Child Health Utility 9D, CI confidence interval, PedsQL GCS Pediatric Quality of Life Inventory™ v4.0 Generic Core Scales

Fig. 3
figure 3

Responsiveness results for PedsQL GCS and CHU9D, i.e. effect sizes and 95% CIs for changes in condition status for autism/Asperger’s, epilepsy, and type 1 diabetes over the 3 age group progressions. The effect size point estimates would be expected to trend from left to right for a responsive instrument, with a negative effect size expected for the “worse” category and an effect size close to zero for “same”. CHU9D Child Health Utility 9D, CI confidence interval, PedsQL GCS Pediatric Quality of Life Inventory™ v4.0 Generic Core Scales

The responsiveness of the PedsQL GCS and CHU9D is consistent with the expected trend in ES for children with anxiety/depression in all the three age group progressions, asthma in the 14–15- to 16–17-years-old progression, ADHD in the 10–11- to 12–13-years-old and 12–13- to 14–15-years-old progressions, autism/Asperger’s in the 10–11- to 12–13-years-old and 14–15- to 16–17-years-old progressions, and type 1 diabetes in the 10–11- to 12–13-years-old progression. The responsiveness of PedsQL GCS is also consistent with the expected trend in ES for children with epilepsy for the 12–13- to 14–15-years-old progression, while it is inconsistent for CHU9D for these children. The responsiveness of CHU9D is consistent with the expected trend in ES observed in children with type 1 diabetes for the 12–13- to 14–15-years-old progression, and is inconsistent for PedsQL GCS for these children. The responsiveness of PedsQL GCS and CHU9D are both inconsistent with the expected trend in ES observed in children with asthma for the 10–11- to 12–13-years-old and 12–13- to 14–15-years-old progressions, ADHD in the 14–15- to 16–17-years-old progression, autism/Asperger’s in the 12–13- to 14–15-years-old progression, and epilepsy in the 14–15- to 16–17-years-old progression.

For the responsiveness results that were consistent with the hypothesised direction of change for ES, most of the estimated ES were relatively small for both the PedsQL GCS and CHU9D. The responsiveness results in children with epilepsy and type 1 diabetes are likely underpowered due to the low numbers, and there were no observations in the “worse” category. The point estimates and 95% CIs for the PedsQL GCS and CHU9D ES by condition and age group progressions are presented in Supplementary Tables S5 and S6, respectively, in the ESM.

4 Discussion

This study investigated the psychometric properties of two generic childhood HRQOL instruments, the PedsQL GCS and CHU9D. This was conducted in the context of a large longitudinal cohort of Australian children and adolescents (aged 10–17 years old) with and without parent proxy-reported clinical conditions. The psychometric properties evaluated included acceptability, reliability, validity, and responsiveness, and the clinical conditions were asthma, anxiety/depression, ADHD, autism/Asperger’s, epilepsy, and type 1 diabetes. To our knowledge, this is the first time that a rigorous psychometric assessment of PedsQL GCS and CHU9D has been possible for a longitudinal cohort that includes reporting of a range of common chronic health conditions.

The findings of this study indicate that PedsQL GCS demonstrated good acceptability, while missingness and ceiling effects for the CHU9D exceeded acceptable levels. Overall, the PedsQL GCS and CHU9D both showed good internal consistency reliability in this cohort of children and provide users with confidence that the items within the summary subscales of PedsQL GCS and total score scales for the PedsQL GCS and CHU9D are internally consistent with the overall construct of HRQOL as intended. The low correlation between the PedsQL GCS total score and the CHU9D utility score in this study suggested poor convergent validity, even though both instruments are measuring the same construct, i.e. HRQOL. The evidence for known group validity suggested that PedsQL GCS demonstrated greater sensitivity than CHU9D at discriminating between children with and without the six common chronic conditions investigated. However, both instruments were able to detect significant differences for children with and without anxiety/depression, ADHD, and autism/Asperger’s, which provides confidence that both instruments are able to discriminate between groups of children based on the status of these conditions. This study also assessed responsiveness, which is not often assessed in children [22]. The responsiveness performance of both PedsQL GCS and CHU9D was variable for the six conditions across the selected age groups in the B and K cohorts, and the majority of the estimated ES were relatively small for both the PedsQL GCS and CHU9D.

A potential reason for the data missingness for the CHU9D could be that it is child self-reported in the LSAC and children with these clinical conditions may find it challenging when self-completing the CHU9D. The observed ceiling effects for CHU9D in the children and adolescents in this study suggest that the utility scores are skewed and clustered towards the top of the scale and may not measure participants on the higher end of the scale very well. A multi-comparison cross-sectional study of Australian children aged 11–12 years old also reported a skewed distribution of HRQOL measured by CHU9D, i.e. more children reported high HRQOL scores [26]. However, another Australian study found no evidence of ceiling effects for the CHU9D in an Australian community-based sample of adolescents aged 11–17 years old [51].

The poor convergent validity between the PedsQL GCS and CHU9D likely reflects the method variance as a consequence of the PedsQL GCS being parent proxy-reported and the CHU9D being child self-reported in the LSAC dataset. A systematic review concluded that there were differences in the level of agreement between proxy-reported and self-reported HRQOL in children and adolescents [52]. Moreover, in their systematic review, which included preference- and non-preference-based measures, Jardine and colleagues [53] noted that children and parents differ in their perception of HRQOL, particularly for subjective domains (e.g. emotional, psychosocial). This further supports the potential contribution of a proxy-reported instrument to the poor correlation noted between the PedsQL GCS and CHU9D in this study. Although both instruments are purported to measure the same construct, i.e. HRQOL, the PedsQL GCS and CHU9D have different numbers of items (23 and 9 items, respectively), and different scoring systems, and may therefore capture different aspects of HRQOL slightly differently from each other. Each instrument may also include different types of items, which means that different aspects of the construct are measured, thereby potentially reducing association within the convergent validity assessment. The poor convergent validity between the PedsQL GCS and CHU9D also highlights the differences in the purpose of these instruments and their content validity, which are important considerations for instrument selection. The PedsQL GCS is designed to measure HRQOL on an unweighted scale, while the CHU9D is a preference-based HRQOL instrument designed to measure health utility that is weighted by preferences. They also differ in the instrument development process. The PedsQL GCS builds on previous iterations of instrument development, and only the initial PedsQL measurement model was developed in a cancer patient cohort aged 8–18 years old [12, 30]. While the CHU9D has been validated for use in general population adolescents aged 11–17 years old, it was initially developed and validated in two paediatric samples (aged 7–11 years old), i.e. a general sample and a clinical sample that included a wide range of health conditions and severity [17,18,19, 31]. Although not formally evaluated in this study, the choice between the two instruments should also take account of the content validity and original purpose of the instrument.

Known group validity and responsiveness are important properties for the measurement of HRQOL in clinical trials as intervention effectiveness is contingent on being able to detect meaningful differences between groups and changes over time because of the intervention. By using a single cohort, this study demonstrated that for known group validity the PedsQL GCS was more sensitive than the CHU9D at discriminating between the same children and adolescents with and without the health conditions, with larger changes in HRQOL scores estimated for the PedsQL GCS than for the CHU9D for most of the health conditions except anxiety/depression and asthma. The incorporation of a larger number of items in the total score for the PedsQL GCS and/or the questions themselves may be more relevant to the impacts of these health conditions and may account for the PedsQL GCS demonstrating stronger performance with known group validity than the CHU9D. The variable responsiveness found for the CHU9D may be due to the ceiling effects as evidenced by the clustering of the utility scores at the higher end of the scale, which may impair its ability to detect change over time. This evaluation raises the question of whether the CHU9D is sensitive enough to detect changes in condition status over time in the common health conditions in this study. Potential implications for this issue relate to the use of the CHU9D to calculate QALYs, which is relevant to cost-utility analyses in economic evaluations.

4.1 Strengths and Limitations

Strengths of this study include its large, diverse, and population-representative sample (n = 15,568 observations from 7201 children) drawn from longitudinal data that enabled the assessment of multiple psychometric properties for the PedsQL GCS and CHU9D in a single cohort of Australian children and adolescents. The longitudinal nature allowed us to investigate responsiveness, which is not often possible. The psychometric methods used were rigorous and based on established practice guidelines and criteria.

This study had limitations, which included the potential for reporting bias as the health conditions were proxy-reported by the child’s parent, did not report severity of conditions, and were not supported by clinical or other data. Additionally, the PedsQL GCS data were limited to parent proxy-reporting of PedsQL GCS as the LSAC did not use the child self-report versions, while CHU9D was child self-reported. This may have impacted the broad comparison between the two instruments against the absolute set of psychometric assessment criteria. As this was a secondary analysis using an existing dataset from the LSAC, some of the analysis and methods were constrained by the availability of variables and measures in this existing dataset.

Patient burden and comprehension and test–retest reliability could not be assessed using the LSAC data. Both these psychometric properties are important features to assess in future studies for PedsQL GCS and CHU9D. Future work with the modern psychometric paradigm (e.g. based on Rasch measurement theory) could undertake analyses that considered the extent to which the data for each health condition showed differential item functioning by age, as this was beyond the scope of this study, which adhered to the classical psychometric paradigm. With PedsQL GCS being one of the most commonly used paediatric PROMs, there is also the potential for future research to build on the psychometric evidence base for the PedsQL GCS and develop it into an HRQOL preference-based instrument for use in economic evaluations.

5 Conclusion

This study provides a valuable contribution to the evidence base of psychometric performance of the PedsQL GCS and CHU9D in a large representative cohort of Australian children and adolescents with common chronic health conditions. The PedsQL GCS (parent proxy-report) instrument demonstrated consistent psychometric robustness against a set of absolute criteria for the psychometric properties assessed in this study. The CHU9D (child self-report) also demonstrated good psychometric performance; however, it was more variable in its performance with some of the psychometric properties such as acceptability and known group validity. Evidence and discussion from this study can aid in appropriate HRQOL instrument selection for the required context by researchers and clinicians, to ensure that clinical and policy decision-making outcomes are based on consistent and robust research and evaluations.