The value of a universal outcome measure such as the quality-adjusted life-year (QALY) has long been recognized for health funding allocation decisions that span diverse patient populations. Consequently, guidelines for economic evaluations produced and disseminated by health technology assessment organizations around the globe recommend a cost-utility analysis as the preferred analytic technique [1,2,3]. The preference for cost-utility analysis evidence by decision makers has brought into stark relief significant challenges in measuring QALYs in pediatric populations. The challenges include but are not limited to: a lack of valid preference-weighted health-related quality-of-life (HRQoL) measures for children of different ages, particularly infants, toddlers, and preschoolers; cognitive barriers for direct elicitation of utilities; the need to rely on proxy reports for child health states; and disagreement regarding whose perspective—child or adult—is most relevant for valuing and measuring child health states [4,5,6,7,8,9,10]. Facing difficult funding allocation decisions regarding child heath interventions where QALY evidence is lacking, health technology assessment agencies around the globe have taken note of these challenges. Among the initiatives undertaken in recent years to examine these issues more closely include a 2020 joint National Institute for Health and Care Excellence-International Society for Pharmacoeconomics and Outcomes Research roundtable attended by 22 participants representing 11 health technology assessment agencies and a 2021 EuroQol workshop on valuing health in children.

In 2019, the Australian government’s Medical Research Future Fund Preventive and Public Health Research Initiative announced a Targeted Health System and Community Organisation Research Grant competition with a call for applications on “tools to value health change in paediatric populations.” This topic was spurred by the Australian Pharmaceutical Benefits Advisory Committee to: “gather evidence regarding the appropriateness of existing tools to value health change in paediatric populations, particularly for the conduct of cost-utility analysis, and then to undertake valuation studies to provide standardized utility scores for some of the leading health state instruments.” The present collection includes findings from one of the funded teams, the QUOKKA Research Program [11].

The findings presented in this collection stem mainly from the Australian Paediatric Multi-Instrument Comparison (P-MIC) study [12]. This large prospective observational study conducted in 2021–2022 collected data concurrently for multiple generic and disease-specific pediatric HRQoL instruments to assess and compare their psychometric performance across a range of age groups and patient populations. Close to 6000 Australian children and their parents/caregivers were enrolled in three samples, including a hospital-based sample of children with a wide range of health conditions, an online general population sample, and an online sample of disease-specific groups.

In a study comparing the psychometric performance of PedsQL, EQ-5D-Y-3L, EQ-5D-Y-5L, Child Health Utility 9D (CHU9D), AQoL-6D, and Health Utilities Index (HUI)3 stratified by child age, report type, and child health status, Jones et al. found all instruments demonstrated known group, convergent, and divergent validity and the EQ-5D-Y-3L demonstrated ceiling effects [13]. Among the preference-weighted HRQoL instruments, the EQ-5D-Y-3L, EQ-5D-Y-5L, and CHU9D demonstrated acceptable test-retest reliability and responsiveness to improved or worsening health. The AQoL-6D and HUI3 were not administered to the full sample and assessments of their test-retest reliability and responsiveness were inconclusive. Importantly, performance varied by child age and report type (parent vs self). In a sub-study of children and adolescents with mental health conditions, the PedsQL, CHU9D, EQ-5D-Y-3L, and EQ-5D-Y-5L demonstrated good performance for acceptability/feasibility, known group validity, and convergent validity [14]. The CHU9D and PedsQL showed no floor or ceiling effects and fair to good test-retest reliability while the EQ-5D-Y-3L showed the highest ceiling effects and lower test-retest reliability. The AQoL-6D and HUI3 demonstrated good acceptability/feasibility, no floor or ceiling effects, and good convergent validity, yet poorer performance on known group validity. The instruments did not perform equally well for all psychometric properties, indicating that researchers must choose carefully depending on the age, gender, and type of mental health condition of the study population, as well as the aim of the study [14].

A direct head-to-head comparison of the EQ-5D-Y-3L and Y-5L revealed lower ceiling effects and greater discriminatory power for the Y-5L compared with the Y-3L for both proxy-report and self-report groups [15]. Ceiling effects were slightly higher for proxy compared with self respondents. The findings confirm previous studies that caregivers and children report HRQoL differently [16, 17].

A particular challenge is assessing HRQoL in toddlers. van Heusden et al. compared adapted versions of the EQ-5D-Y-3L and EQ-5D-Y-5L to the original versions in children aged 2–4 years [18]. The adapted versions included age-appropriate revised wording for the dimension levels and/or examples of the dimensions. Strong convergence between the adapted and original EQ-5D-Y-3L and EQ-5D-Y-5L instruments was observed, with the adapted versions of both questionnaires demonstrating more responses distributed in the more severe levels, particularly in the usual activity and mobility dimensions. Likewise, greater effect sizes for known-group validity were observed for the adapted versus the original versions when comparing children with and without health conditions. Test-retest reliability was also better for the adapted versions [18]. A separate analysis examined the psychometric properties of the child-specific CHU9D proxy version with guidance notes in children aged 2–4 years [19]. The CHU9D did not demonstrate ceiling effects and correlated moderately to strongly with comparable items in the PedsQL. In this age group, CHU9D differentiated between groups with known health differences with moderate-to-large effect sizes but showed low test-retest reliability for some dimensions at the 2-day follow-up [19].

Examining optimal strategies for assessing preference-weighted HRQoL in young children inevitably requires close examination of alternative approaches to proxy assessment. Using the EQ-5D-Y-3L administered to a community sample of parent-child dyads, Khanna et al. compared the reports of proxies asked to consider their own view of their child’s HRQoL to a proxy that responded as they believed their child would and to the child’s self-report [20]. Agreement between self and both proxy types was low, with the greatest disagreement observed for “feeling worried, sad or unhappy,” though agreement was better for proxies asked to report as they thought their child would. This analysis showed that the way proxy questions are framed can affect responses and highlights the need to clearly define and assess the impact of alternative proxy approaches. Proxy reports differ from self-reports of child health in part due to the influence of the parent/caregiver’s own health state on their assessment. The performance of the EuroQol Health and Wellbeing instrument short form (EQ-HWB-S) was examined in caregivers of children with health conditions [21]. EQ-HWB-S is a new experimental preference-weighted instrument designed to measure and value health and well-being over a broader range of attributes than EQ-5D, with a particular focus on caregivers and those receiving health and/or social care [22]. For that reason, it is potentially of interest for assessing health and well-being outcomes in the parents and other family members of children requiring health and/or social care. Worse EQ-HWB-S scores were observed in parents of children with health conditions, children with special healthcare needs, and in parents reporting being impacted by COVID [21].

In an effort to investigate more deeply the differences between existing measures used for health state preference ascertainment in children and their alignment with an over-arching model of child HRQoL, an exploratory factor analysis was conducted by pooling items from the EQ-5D-Y-5L, CHU9D, PedsQL, and HUI using data collected in the P-MIC study [23]. While the emerging factors reflected the attributes common to these instruments, different but overlapping structures were found for the proxy-reported and self-reported data [23].

Finally, in research aiming to further expand the means by which HRQoL in young children might be assessed, a systematic review of patient-reported outcome measures (PROMs) that used audio, visual, animation, and adapted easy-read methods to assess quality of life in children was conducted [24]. The review found 22 PROMs that included a diverse range of domains, including physical and emotional health, and social functioning. Almost half engaged children in their development. While child-friendly tools are welcome, they must adhere to stringent criteria for validity, reliability, and responsiveness.

It is of note that the comparative performance results do not signal any one measure as being broadly superior to the others. While a universal concept of HRQoL is appealing to standardize QALY-based budget allocation decision making, this research reveals that there is no gold standard instrument that would be recommended for use in all age groups and patient populations. For those developing guidelines for health technology assessment in pediatric age ranges, the results suggest that some flexibility in which instrument or instruments are recommended for which purposes is prudent.

The studies by Khanna et al. and Bailey et al. provide further insight into another fundamental aspect of measuring child health, namely the involvement of parents and caregivers, either as proxy reporters of their child’s health [20] or as individuals whose own health may be affected by their child’s condition [21]. A proxy perspective on a child’s health state whether “within the skin” or a parent rating, is often necessary, even in older children, if the child is too ill or otherwise unable to respond. For infants and toddlers, a “within the skin” proxy perspective is not possible and an adult rating based on observation may be the only proxy option. Understanding the impact of perspective on both deriving underlying utility weights as well as measuring a child’s HRQoL are essential areas for future research [25, 26].

The use of proxy responses is but one of the ways to extend the measurement of child health to as many children possible. Exploring additional options that include the use of audio, visual, and/or animated elements for direct elicitation as seen in the review by Mpundu-Kaambwa et al. is another exciting area of research, though arguably in its early stages [24].

Altogether, the papers in this collection cover a range of topics that are critical to advancing the measurement and valuation of health in children. In particular, it is rare to see multi-instrument comparisons performed in a large and diverse population of children with and without health problems. Given the rigor with which the fieldwork and analysis were carried out, the studies provide some of the strongest evidence currently available for understanding and interpreting the comparative performance of preference-weighted instruments intended for use in children. A strength was the inclusion of a wide range of instruments, including newer tools such as EQ-5D-Y-5L and EQ-HWB-S. The assessments of the performance of CHU-9D and the Adapted EQ-5D-Y in age groups that are younger than those for whom these tools were designed are also of keen interest. In particular, the assessment of performance of existing instruments in children aged 2–4 years is welcome given the lack of preference-weighted PROMs for this population. It should be noted that the TANDI (currently known as the EQ-TIPS), a preference-weighted tool in development for use in the 0–3 years age range [27], was included in the PMIC study and future findings will be a welcome addition. Other tools in development for this age group—arguably the most difficult to assess—include the Health Utilities Pre-School (HuPS) [28], an extension of the HUI3, and the Infant health-related Quality of life Instrument (IQI) [29]. Given the uniqueness of this population in terms of health attributes and the dependency on parents to attain health and healthy development, ensuring that the underlying construct of HRQoL is captured will be essential [30].

One limitation, as the authors have pointed out, is that the AQoL-6D and HUI3 were tested in small samples making comparisons with other tools potentially misleading. This was perhaps a missed opportunity, but understandable given concerns regarding respondent burden. Further comparative research will be required with these instruments. Another limitation was the use of level sum scores to assess validity and reliability in the absence of underlying tariff sets. As such, these results should be considered preliminary. Previous studies have illustrated differences in utilities for the same pediatric tool when different tariff sets (e.g., adolescent vs adult) were applied [31, 32]. It is therefore essential that comparative psychometric performance be assessed based on full scoring as intended in the instrument design. A significant but inevitable limitation was the inability to determine to what extent the included instruments accurately captured the construct of pediatric HRQoL. As the authors acknowledged, attributes relevant for children and very young children must reflect their developmental stage and may not be the same as those found in adult instruments. As yet there is no consensus on what constitutes HRQoL for children of different ages. There is an inherent tension between conserving attributes and levels used in adult tools versus developing tools that are more reflective of infant or young child HRQoL. While the former enables easier pooling of data across pediatric and adult age groups for lifetime modeling, it may lack construct validity. One might argue that using different tools for different age groups for lifetime modeling and QALY calculation is not inappropriate if each tool is conceptually valid for that age group. Finally, the P-MIC study was performed in an Australian population thereby potentially limiting generalizability to other populations. The P-MIC study team is engaging with international collaborators to conduct additional comparative performance research on these tools.

Taken as a whole, this collection amply illustrates both the challenges in conducting this research, and the growing interest generated by working in pediatric PROM research. The challenges include limitations in available sample sizes, recruiting and collecting data directly from children over a broad spectrum of developmental stages, and the need to rely on proxy parents and caregivers when children cannot be heard directly. The QUOKKA team has made a substantial contribution to helping us deal with these challenges, though much work remains. Other active research in this area includes work by the Tools for Outcomes Research to measure and Value Child health (TORCH) team [33]. Using systematic reviews, prospective comparative research, and mixed quantitative and qualitative methods, this team is investigating the conceptual and methodologic underpinnings of measuring and valuing child health [9, 34,35,36,37,38,39,40]. The team at Technology Assessment at SickKids (TASK) is conducting prospective research to assess the psychometric performance of preference-weighted HRQoL instruments in children with inflammatory bowel disease [31, 41, 42] and intestinal failure. Several groups are deriving utility weights for health states described by the generic non-preference-weighted PedsQL via mapping to the CHU-9D and EQ-5D [43,44,45,46,47,48]. Another promising area of research is using direct elicitation methods including discrete choice experiments to value improvements in children’s health [32, 49,50,51].

The annual volume of published pediatric cost-utility analyses continues to rise sharply and in 2022, a record 176 studies were published [52]. This rise in demand for pediatric cost-utility analyses must be met with rigorous methods and tools. It is hoped that the present collection and other active research in this field will meet this need, contributing the highest quality evidence to inform healthcare decision making.