FormalPara Key Points for Decision Makers

Child–parent agreement at the individual dimension level was higher for CHU9D than for PedsQLTM. In contrast, agreement for overall HRQoL was lower for CHU9D relative to the PedsQLTM.

In general, younger children (6–7 years) reported comparable agreement with parental proxies to their older counterparts providing some evidence to indicate that they may be able to meaningfully self-report.

1 Introduction

Health-related quality of life (HRQoL) is a key component of evaluating health outcomes to determine the value of health technologies, and a crucial indicator for appraising their quality [1, 2]. HRQoL measures can be broadly categorised into two main types: preference-based and non-preference-based. The primary distinction between the two is that the former measures generate health state utilities [3]. This allows for the calculation of Quality-Adjusted Life Years (QALYs), a key outcome measure in cost-utility analysis (CUA), widely used by healthcare decision-makers globally to inform reimbursement decisions for healthcare interventions and services [4].

The Child Health Utility 9D (CHU9D) is a preference-based HRQoL measure for application with children and young people and has been validated for the age group of 7–17 years. It is the only measure amongst the nine generic preference-based HRQoL measures that was designed exclusively for this population [5, 6]. The CHU9D has an adolescent specific value set available, facilitating the calculation of utilities based on the HRQoL preferences of young people themselves [7].

The Paediatric Quality of Life Inventory 4.0 (PedsQLTM 4.0) Generic Core Scales is a generic HRQoL measure developed for use in both healthy and patient populations of children and adolescents [8, 9]. The PedsQLTM has demonstrated reliability and validity as a self-report measure in children as young as 5–7 years old [8]. Since it is currently a non-preference-based measure, it is not possible to calculate utilities for the purposes of applying PedsQLTM in economic evaluation. However, the instrument has been widely applied and recognised as a valuable tool for measuring HRQoL in a variety of paediatric populations in both clinical and research settings [10].

Self-reporting of a person’s HRQoL from their own perspective is preferable wherever possible. However self-report is often challenging in child health research, especially for children with severe health conditions, very young children and for children with intellectual impairments/developmental delays [11, 12]. Hence, it is common for parents to serve as proxy respondents when assessing the HRQoL of children [13, 14]. While parents can provide valuable information about their child’s HRQoL, it is important to note that they may not always have the same perception as the child [15]. Previous research has reported discrepancies between the child’s self-reported HRQoL and that reported by their parents [16,17,18]. However, it is crucial to evaluate how closely the report provided by the parents aligns with the child’s self-report to determine the extent to which the parental-report is representative of the child’s own HRQoL.

In their review of parent–child reports of HRQoL, predominantly using the PedsQLTM, Eiser and Varni [15] reported that the level of agreement between parents and children may be influenced by several variables. Potential factors identified as contributing to limited parent–child agreement included the type of dimension assessed [15]. Similar to the findings in the studies assessing self and proxy concordance in the reporting of HRQoL within the adult population [19, 20], dimensions associated with objective aspects of health typically showed higher agreement as compared with the more subjective (emotional or social) dimensions [15, 16]. A recent systematic review of self and proxy reporting of generic preference-based paediatric HRQoL measures by our team identified 17 studies reporting dimension-level agreement in children with and without health conditions. In contrast with more observable HRQoL dimensions relating to physical health and functioning, the agreement was observed to be lower for psychosocial-related dimensions (e.g. ‘emotion’ and ‘pain’ attributes of the Health Utilities Index Mark 2/3 or the ‘having pain or discomfort’ and ‘feeling worried, sad, or unhappy’ dimensions of the EQ-5D-Y) [17].

The age of the child is another important factor that may impact the child–parent agreement in the assessment of child HRQoL. However, the role of age is not yet clearly understood with inconsistent results reported for different age groups [15, 17]. A study by Cremeens and colleagues suggested that the age of the child may influence the level of agreement for the PedsQLTM and may interact with the specific dimension being assessed [21]. In a sample of healthy children aged 5.5–8.5 years, they reported a significant agreement between older children (7.5–8.5 years) and parents for overall HRQoL. However, at the dimension level, a significant agreement was observed for the younger children (5.5–6.5 years) within the physical health dimension and for the older children within the psychosocial dimensions (7.5–8.5 years) [21]. To date, the differential effect of age on agreement remains largely unexplored, particularly using preference-based measures.

The main objective of this study was to examine the level of parent-child agreement in reporting of children’s HRQoL (aged 6–12 years) using the CHU9D (a preference-based measure of children’s HRQOL) and the PedsQLTM (a non-preference-based measure of children’s HRQOL) in a community-based sample of Australian children. A secondary objective was to explore the impact of age on child–parent agreement across the dimensions of the two measures.

2 Methods

2.1 Participants and Study Design

Participants for the study were recruited through a partnership with an independent research company, Stable Research Australia. Parents who had previously expressed interest in participating in research studies were sent an invitation letter with details about the study. Children aged 6–12 years, along with their parents, were eligible to participate in this cross-sectional study provided the child was able to read and understand written English and did not have reading disorders or any other condition that would impact their ability to self-complete the measures.

Participants were selected using a proportional stratified random sampling method to ensure a representative sample of the general population in terms of socio-demographic characteristics and common health conditions affecting children, including asthma, anxiety disorders, conduct disorders, depressive disorders, autism spectrum disorders (ASD) and dental caries [22, 23]. To estimate Gwet’s Agreement Coefficient (AC) between two raters with an acceptable error margin of 20%, a minimal sample size of N = 25 is necessary [23]. Therefore, the study aimed to achieve a sufficiently large (N > 25) and representative sample for a robust statistical analysis of child-parent agreement.

Parents provided information about the child’s age, gender and presence of any long-term health condition/s. Additionally, the parents were also asked about their own socio-demographic characteristics including age, gender and postcode. Written informed consent to participate in the study was sought from the parent on behalf of the child prior to commencing the interview.

The study was approved by the Flinders University’s Human Research Ethics Committee (Project ID 4178).

2.2 Procedure

Semi-structured, face-to-face interviews were conducted in April 2021, at Flinders University in South Australia. Child–parent dyads from the community consenting to participate in the study were invited. During the interview, the child was asked to complete the CHU9D and PedsQLTM, and a single-item self-rated general health question, administered via an online platform (REDCap).

The parent completed hard copy (paper and pen) proxy versions of the CHU9D and the PedsQLTM in the same interview room as their child while wearing noise cancelling headphones, to prevent their responses from being influenced by any conversations between the interviewer and the child. Additionally, the parent was also invited to complete an assessment of their own HRQoL using the EQ-5D-3L measure and the single-item self-rated general health question. Both online and paper-pen administrations are equivalent [24] as long as they are consistent for each rater [25]. The respective method for each rater was chosen as a matter of convenience and resource availability.

2.3 Measures

2.3.1 CHU9D

The CHU9D, a validated generic preference-based measure of children’s HRQoL, includes nine dimensions: “Worried”, “Sad”, “Pain”, “Tired”, “Annoyed”, “Schoolwork/homework”, “Sleep”, “Daily routine” and “Activities” and each dimension has five response levels. A scoring algorithm can be used to generate individual level utilities for all possible response combinations to the CHU9D. These utilities required for the calculation of quality adjusted life years (QALYs) for economic evaluation. The utilities range from 1 (full health) to (− 0.1059) for the most severe (PITS) state [7]. An Australian adolescent-specific preference-based scoring algorithm, derived from Australian adolescents aged 11–17 years, was applied in this study to calculate the CHU9D generated utilities [7].

2.3.2 PedsQL™ 4.0 generic core scales

The PedsQLTM 4.0 Generic Core Scales include 23 items that are grouped into four scales (dimensions): physical functioning (8 items), emotional functioning (5 items), social functioning (5 items) and school functioning (5 items). The psychosocial dimensions represent the emotional, social and school functioning subscales of the PedsQLTM whilst the physical dimension represent the physical functioning scale. Since the PedsQLTM does not take into account preferences, equal weights are assigned for each of its 23 items when calculating the total score. Items were scored in reverse and transformed into a 0–100 continuous scale (0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0), such that higher scores denoted better HRQoL. To calculate the mean for individual Scale scores, the items were summed across and divided by the number of items answered. The average individual Scale scores were used to compute a total summary score [8].

2.3.3 EQ-5D-3L

The EQ-5D-3L measures HRQoL across five dimensions: “mobility”, “self-care”, “usual activities”, “pain/discomfort” and “anxiety/depression”. Each dimension has three different response options, ranging from no problems to severe problems [26]. An Australian adult scoring algorithm was applied to calculate the adult utilities.

2.4 Statistical Analysis

Data were analysed using Stata (16.1, Stata Corp LLC, College Station, TX) [27]. Differences in self-reported and proxy-reported CHU9D utilities and PedsQLTM scores and inter-rater agreement were examined both for the overall sample and by age group (6–7 years, 8–10 years and 11–12 years). Additionally, the inter-rater differences and overall concordance were also examined for the subgroups categorised by the presence or absence of health condition (yes/no) and parent gender (female/male). Wilcoxon matched-pairs signed-rank test was used to compare the differences in child and proxy-reported overall HRQoL. Child–parent agreement was estimated using CCC for continuous data, e.g. CHU9D utilities, due to the non-normal distribution of the data [28]. Gwet’s AC1 was used to analyse agreement for categorical data e.g. CHU9D dimension-level HRQoL [29]. Agreement was compared between CHU9D dimensions and overlapping PedsQLTM item/s representing the corresponding CHU9D dimensions [30]. The statistical significance level was set at 0.05.

Both CCC and AC1 take values between − 1 and 1, with higher values indicating better agreement. The agreement results were interpreted using Altman’s scale, which categorises agreement less than or equal to 0.2, 0.4, 0.6, 0.8, and 1 as poor, fair, moderate, good and very good [29]. A weighted version of Gwet’s agreement coefficient (AC2) accounts for partial agreement in adjacent categories allowing the measure to capture the varying degrees of agreement between the child–parent dyad [23]. The results for the weighted AC2 using linear weights have been provided in the Supplementary Information (Table S1).

The Socio-Economic Indexes for Areas-Index of Relative Socioeconomic Disadvantage (SEIFA-IRSD) was used to estimate the socio-economic status of the participants based on information provided from the 2011 Australian Census using the residential post codes. The SEIFA-IRSD deciles measures the relative disadvantage of an area [31]. The SEIFA-IRSD deciles were grouped into quintiles, with the first six deciles categorised as disadvantaged areas (quintiles 1–3) and the last four as advantaged areas (quintiles 4–5).

3 Results

3.1 Child–Parent Participant Characteristics

A total of 89 child–parent dyads were identified as eligible and invited to participate in the study. Of those, four dyads were unable to attend the interview at the scheduled time, resulting in a response rate of 96% (N = 85). The children in the sample had an average age of 9 years (range 6–12 years) and the majority (56%) were female. The parents in the sample had an average age of 41 years (range 29–53 years) with the vast majority (81%) being child–mother dyads (Table 1). Most parents and children rated their own health as good to excellent on the self-rated general health question. This was further supported by the EQ-5D-3L measure, where parents reported a mean utility of 0.87 [standard deviation (SD) = 0.01]. Just under one-third (31%) of the children in the sample were identified by their parents as living with one or more health condition (Table 1). A proportion of the study participants (37%) resided in areas with relative socio-economic disadvantage.

Table 1 Sociodemographic characteristics for all study participants (children and proxies)

3.2 Child–Parent Difference in Reported HRQoL and Overall Concordance

Table 2 describes the child and parent reported HRQoL scores and the dyad agreement using the CHU9D and the PedsQLTM, respectively. Overall, parents underreported children’s HRQoL with the CHU9D but overreported with the PedsQLTM. The difference in medians across the age groups was the largest for ages 11–12 years with the CHU9D and ages 6–7 years with the PedsQLTM. However, these differences were not found to be statistically significant.

Table 2. Description of child and proxy reported HRQoL values and agreement using CHU9D and the PedsQL™ 4.0 generic core scales

The overall agreement between child–parent dyads for both measures was fair with a lower agreement for CHU9D (0.28) (Fig. 1a) than for the PedsQLTM (0.39) (Fig. 1b). The agreement between parents and 8–10-year-olds was good for both measures. For overall HRQoL, this was the only age group that demonstrated a statistically significant level of agreement across both measures.

Fig. 1
figure 1

a Concordance between child and parent reported HRQoL utilities using the CHU9D. b Concordance between child and parent reported HRQoL scores using the PedsQL™

Descriptive analysis indicated that the largest difference in medians in HRQoL ratings between children and proxies across the subgroups was observed in children without any reported health conditions using the CHU9D, while the PedsQLTM also demonstrated a notable inter-rater gap within this subgroup. However, these differences were not statistically significant. Within the same subgroup, a lower agreement between child–parent dyads was also observed with both measures. Additionally, in comparison to the mother–child dyads, father–child dyads exhibited a lower agreement with the CHU9D but higher agreement with the PedsQLTM (Supplementary Information Table S2).

3.3 Comparison of Agreement for CHU9D Dimensions and PedsQLTM Items

Table 3 presents the agreement coefficients (AC1), for the CHU9D dimensions and the corresponding representative PedsQLTM items, for the overall sample and by age group. Child–parent agreement ranged from 0.65 to 0.95 for the CHU9D dimensions and 0.45 to 0.75 for the relevant PedsQLTM items. The agreement was higher for CHU9D dimensions than for the corresponding PedsQLTM items. Among the dimensions related to subjective (internal) experiences, agreement was the highest for ‘sad’ (CHU9D = 0.83) and ‘feeling sad’ (PedsQLTM = 0.37) within the respective measures. The agreement was high for ‘pain’ (0.73) with the CHU9D, whereas its equivalent dimension in the PedsQLTM showed the lowest (poor) agreement (0.15) compared with all other dimensions within the measure. The weakest agreement across the CHU9D dimensions was observed for ‘tired’ (0.31) followed by ‘worried’ (0.45). In addition to the items related to the psychosocial health mentioned above, a poor agreement was also observed for the PedsQLTM item ‘having trouble sleeping’ (0.16). For the physical functioning related dimensions, agreement ranging between moderate to good was observed with both the CHU9D and the PedsQLTM.

Table 3 Comparison of child–parent agreement in CHU9D dimensions with relevant PedsQL™ items by age group

Across the age groups, for the CHU9D dimensions, the only statistically non-significant agreement was observed between parents and children aged 6–7 years for ‘tired’ (0.19). Moreover, for most dimensions, the agreement was lower for the 6–7-year-olds. In contrast, agreement across the majority of the relevant PedsQLTM items was higher for the youngest age group (6–7 years) relative to the older age groups (8–10 and 11–12 years). Furthermore, an insignificant agreement was observed for several non-physical health-related items such as ‘having low energy level’, ‘feeling angry’ and ‘having trouble sleeping’ with both the older age-groups. They also demonstrated a poor agreement for the ‘getting aches and pain’ item. Additionally, an insignificant agreement was also seen between parents and 11–12-year-olds for the ‘worrying what will happen to them’ item.

4 Discussion

This study is the first, to our knowledge, to investigate child-parent agreement of child overall and dimension level HRQoL in a community-based sample of children using two generic HRQoL measures, the CHU9D and PedsQL™ 4.0. This study showed contrasting agreement for overall and dimension-level HRQoL using the two measures. The agreement between parents and children for HRQoL scores was stronger for the PedsQLTM, but weaker for the CHU9D. Conversely, agreement for the individual dimensions was stronger for the CHU9D compared with the PedsQLTM items.

The discrepancy in the consistency of agreement may be attributed, at least in part, to the statistical method used to measure the agreement. This study used two different methods to estimate agreement between the child and parent ratings: CCC for overall HRQoL and Gwet’s AC1 for dimension level HRQoL. Utilities or summary scores combine responses from different dimensions to estimate the overall HRQoL of the child. However, when analysing inter-rater agreement, the dimension/item level responses can offer a more direct measurement of agreement as it provides the disaggregated evaluations of the two raters, i.e. the child and the parent. This may be more informative regarding the specific areas of agreement or disagreement between the child and the parent and, therefore, provides a better understanding of the concordance in evaluations of each aspect of HRQoL. Furthermore, the estimation of CCC in this study may have been affected by an increased level of variation in ratings resulting from the high number of rater pairs, which could have potentially led to an underestimation of the true magnitude of the CCC [23].

The inter-rater differences in HRQoL scores across age groups using both measures did not correspond with the trends in agreement observed at the individual dimension level. For instance, in comparison with the other age groups, the 11–12 years age group had the greatest inter-rater gap with the CHU9D utilities. However, the dimension level agreement was similar across age groups. Additionally, while the same age group had the smallest inter-rater difference with the PedsQLTM summary scores, they demonstrated lower agreement levels across most of its items compared with the youngest age group. Hence, it is important to acknowledge that the differences in the aggregated child and proxy reported HRQoL scores do not provide a measure of agreement [17].

Towards the opposite end of the age spectrum, a recent systematic review investigated the level of agreement between adult proxies and older adults with cognitive impairment [32]. Their findings indicated that there was some evidence suggesting higher levels of agreement in more observable HRQoL dimensions, such as physical health and mobility, compared with less observable dimensions like emotional well-being [32]. Typically, the available evidence indicates that parents also tend to be more concordant at reporting HRQoL dimensions related to the more easily observable attributes compared with those that are more subjective (internal) to the child [15, 17]. However, in this study, we found that with the CHU9D, a high level of agreement was obtained for the psychosocial health dimension ‘sad’. The agreement for physical health-related dimensions (‘daily routine’ and ‘able to join in activities’) was low but moderate. These findings contrasted with the agreement observed for similar PedsQLTM items. For example, agreement was higher for PedsQLTM physical health items, i.e., ‘participating in sports activity or exercise’ and ‘taking a bath or shower by him or herself’ as compared with the ‘feeling sad’ item. Previous studies have reported a low agreement for pain using preference-based [33,34,35,36,37] and non-preference-based measures [38, 39]. In this study, a substantially higher level of agreement was observed for the ‘pain’ dimension with the CHU9D as compared with the ‘getting aches and pains’ item of the PedsQLTM. Therefore, these findings suggest a possible interaction between the measure used and the dimension under consideration in determining the degree of agreement.

The findings in this study indicated a higher agreement for the CHU9D dimensions compared with the corresponding PedsQLTM items. Whilst both the measures were developed for use in children and adolescents in the development and validation of the instrument, the CHU9D followed a bottom-up approach that directly involved children in the development and validation of the instrument [5], whereas the PedsQLTM adopted a top-down approach and was developed on the basis of a broader study of HRQoL in children with cancer [40]. The difference in agreement may also be attributed to the timeframe of assessment for each measure. In the CHU9D, respondents are asked about the (child’s) health ‘today’ whereas the PedsQLTM asks the respondent to report on their health over the ‘past one month’. Thus, one possible explanation for the higher agreement found within the CHU9D dimensions may be its shorter time frame, which may reduce recall bias and result in less variability in perceived HRQoL [4]. Another contributing factor may be the difference in what the CHU9D and PedsQLTM measures assess. The CHU9D measures the severity of impairment whereas the PedsQLTM which measures frequency. For example, in the CHU9D dimension ‘sad’, the response levels range from ‘don’t feel sad’ to ‘feel very sad’, whilst the PedsQLTM response levels for the corresponding item ‘feeling sad’ range from ‘never’ to ‘almost always’[4].

Studies reporting the level of child–parent agreement predominantly focus on samples including children aged 8 years and above [41,42,43,44,45]. The evidence for agreement in younger age groups, e.g. 6–7 years old and capable of self-reporting their HRQoL using the PedsQLTM or the CHU9D is limited [21, 46]. In this study, dyads comprising the youngest age-group (6–7 years) reported relatively lower agreement with the CHU9D. This may be owing to children in this age group differing in their understanding of HRQoL as compared with their parents [47]. Younger children under 10 years of age have been reported to have difficulties with comprehension and recall of health-related events, as well as the associated frequency and severity [47]. However, except for the ‘tired’ dimension, there was no clear association between age and agreement across any other CHU9D dimensions. In contrast to the CHU9D findings, the older age groups, particularly the 11–12-year-olds, showed worse agreement for the PedsQLTM items compared with the youngest age group comprising 6–7-year-olds. The evidence in the literature examining the relationship between age of the child and agreement using both preference and non-preference-based measures is inconsistent [17, 21, 48, 49]. This study found conflicting results in the same population for the two measures. The reasons for these discrepancies are unclear. Further research including mixed methods studies, which combine quantitative investigations with in-depth qualitative research using cognitive interviewing techniques, for example ‘think-aloud’, may be helpful in providing a more detailed understanding of the reasons for these discrepancies in reporting child HRQoL [50].

The existing literature on the influence of health status of the child on agreement is inconsistent for both preference and non-preference-based measures [17, 18]. Some studies suggest that in chronic illnesses, greater severity of the disease [51] or a higher frequency of exacerbations [52] may be associated with higher levels of child-parent agreement. However, for chronic conditions like cancer, there is a lack of consensus regarding the degree of agreement [15, 17]. Conversely, acute illnesses have been associated with lower inter-rater agreement [53]. Notably, Catchpool et al. reported a low agreement (Pearson’s correlation coefficient = 0.13) in a sample of Australian children aged 11–12 years and their parents with the CHU9D [54]. Similarly, in this study, a lower agreement was observed for the overall HRQoL across both the measures for children without any reported health condition than those with reported health conditions. Additionally, a higher maternal than paternal involvement in childcare has been linked to the higher mother–child agreement levels evident in literature [17, 55, 56]. In this study, a similar trend was observed with the CHU9D, but this was not consistently reflected with the PedsQLTM. Other studies have indicated that parental gender might not significantly confound parent proxy reports of child HRQoL [57, 58]. Considering that the literature is inconclusive, and the limited sample size of this study, further research with a larger sample size is warranted to substantiate our findings.

This study has limitations that are important to highlight. The study was conducted in a community-based sample of South Australian children who were relatively healthy. Hence, the findings may not be generalisable to more diverse samples including children with regular contact with health services and children with disabilities. Whilst the study sample was relatively small, good representation was achieved across age groups and approximately one-third of children were living with health conditions and/or living in areas of relative disadvantage. However, the main findings, particularly in relation to age group analyses, need to be interpreted with caution and further research needs to be conducted to substantiate these findings in larger community-based and patient samples. The CHU9D utility weights employed in this study were established using adolescents aged 11–17 years and then applied to a sample that included a younger age group. It is recognised that the value sets derived from children/adolescents may differ from those derived from adults adopting a child’s perspective [59]. Nevertheless, additional research is required to determine the youngest age at which children can provide valuations, taking into account ethical considerations, and to explore the potential impact of this on valuing child HRQoL across different age groups. Moreover, as the utility weights were used to estimate the CHU9D scores, an additional preference-weighted step not currently available for the PedsQL™, score comparisons between the two measures were difficult. Finally, the study investigated agreement between child–parent dyads using the CHU9D and PedsQLTM only, and hence the findings may not necessarily be generalisable to other measures used in the assessment of HRQoL in child populations.

5 Conclusion

This study found a low child–parent agreement for overall HRQoL across both measures, with CHU9D exhibiting a lower agreement relative to the PedsQLTM. In contrast, at the individual dimension level, inter-rater agreement was higher for CHU9D than for PedsQLTM. CHU9D showed the highest agreement with the dimensions of ‘sad’ and ‘pain’, whereas for the PedsQLTM, agreement was the highest for the physical health items. There was no clear interaction between age and CHU9D dimensions. However, for the relevant PedsQLTM items, the dimension level agreement was stronger for the youngest children (6–7 years) in the sample and weaker for older children (8–10 and 11–12 years), particularly for the psychosocial health items. Further research in larger and more diverse study samples and across age groups is needed to substantiate these findings. The introduction of a preference-based scoring algorithm for the PedsQLTM will also facilitate empirical comparisons of child parental agreement at overall utility level and enable the impact of child and parent perspectives on HRQoL benefits for economic evaluations of interventions targeted at paediatric populations to be assessed.