Introduction

As of April 30, 2023, the World Health Organization (WHO) has reported over 765 million confirmed COVID-19 cases and 6.9 million deaths worldwide [1]. Beyond pulmonary complications and deaths, COVID-19 can impact the physical, emotional, and social well-being of children and adolescents, with increased rates of anxiety and depressive symptoms observed from ages 7 to 17 [2]. Longitudinal studies have indicated that COVID-19 impacts on physical and school-related aspects in adolescents aged ≥ 14 years [3]. Notably, research focusing on the vulnerable age group of birth to three years is scarce, despite their developmental stage and reliance on carers. A single study of infants and toddlers with atopic dermatitis found significantly worse HRQoL during the pandemic, underscoring the need for further research in this age group [4]. Furthermore, recruiting a population with COVID-19 not only offers insights into the direct effects of the disease but also presents a unique opportunity to study a diverse range of symptom severities among younger children, thus providing valuable data to understand the full spectrum of the pandemic’s impact on paediatric populations. However, existing generic HRQoL instruments which have been used to assess HRQoL in children and adolescents affected by COVID-19, such as KIDSCREEN-10, PedsQL 4.0, and KINDL-R questionnaire [5,6,7], lack societal preference-based scores which can be used in economic evaluations.

The EQ-5D-Y-3L is a widely used quality of life instrument for children and adolescents [8], which comes with a preference-based measures index score that can be used to calculate the Quality-Adjusted Life Years (QALYs). This provides insights into healthcare resource utilization and costs related to children’s HRQoL impact. There are different administration versions of EQ-5D-Y-3L, the self-complete and the proxy version. The ‘proxy version’ is crucial for parental evaluation when children cannot self-rate. For children aged 4–7 years, a proxy version should be used. In children aged over eight years, the self-complete version is generally recommended [9]. Despite demonstrating good reliability, validity, and responsiveness in paediatric patients with severe pneumonia and other respiratory conditions [10, 11], it lacks specific psychometric assessment for younger populations affected by COVID-19. Efforts in developing the Chinese EQ-5D-Y-3L value sets demand evaluation [12], especially for known-group validity and responsiveness in head-to-head studies [13,14,15]. Moreover, Studies indicate parents of children with COVID-19 or other infections may underestimate their child’s HRQoL [10, 16], underscore the importance of examining agreement and discordance between self-complete and proxy versions [9].

The experimental version of EQ-TIPS (EQ Toddler and Infant populations questionnaire), developed in 2018, assesses the physical, mental, emotional, and social functions of children aged 0 to 36 months [17]. Although currently in the experimental phase with no definitive version or available value sets, the EQ-TIPS has shown good construct validity in young children who have undergone general surgery, burn injury, or cardiac surgery [18]. However, additional research is needed to explore other properties, including reliability (examined in a small sample of the general population) [19], feasibility, and clinical utility. This research will be useful in moving the experimental EQ-TIPS towards an approved version, particularly with regard to cross-cultural validity.

During the COVID-19 pandemic in China, an opportunity arose to test the psychometric properties of the EQ-5D-Y-3L and the experimental EQ-TIPS in paediatric patients with this condition, utilizing the newly published Chinese value sets with the Y-3L. Therefore, this study had three objectives. First, to assess the validity, inter-rater reliability, and responsiveness of the self-complete version of the EQ-5D-Y-3L in children and adolescents aged 4–18 years. Second, to compare outcomes between the self-complete and proxy versions of the EQ-5D-Y-3L in those aged 6–18 years. Finally, to evaluate the distributional properties, known-group validity, and responsiveness of the experimental EQ-TIPS in children with COVID-19 aged under four years.

Methods

Sampling

This was a descriptive, longitudinal, prospective study with a repeated measures designed to test for reliability, validity and responsiveness of the instruments. We recruited paediatric inpatients and outpatients with confirmed COVID-19 infection and treated at Renji Academic Hospital in Shanghai from May 2022 to January 2023, along with their parental carers. A control group, consisting of infants, children and adolescents testing negative for COVID-19 tests with no related symptoms, was recruited using a ‘snowball approach’, primarily by reaching out to the siblings and friends of the patients.

For paediatric patients, the inclusion criteria were: (1) aged between 0 and 18 years; (2) confirmed COVID-19 infection through PCR (Polymerase Chain Reaction) or antigen test; (3) newly diagnosed by a specialist within the past month, without prior infection; and (4) admitted as inpatients or receiving outpatient care. Individuals aged 6–18 years, proficient in Chinese, and capable of independent questionnaire completion were eligible for the self-complete version. Those with other known respiratory viral infections within the preceding three months or known chronic health conditions were excluded.

For non-infected children and adolescents in the control group, the inclusion criteria were as follows: aged 0–18 years, no history of confirmed COVID-19 infection based on negative PCR or antigen test results, and generally healthy with no illnesses or symptoms suggestive of COVID-19 in the past three months. Exclusions applied to individuals not well enough to complete surveys or lacking written informed consent from legal guardians.

For carers, inclusion criteria were: (1) a primary carer was present in the week before the survey for the eligible child, (2) parent of an eligible child respondent, (3) physically present during the outpatient visit or admission, and (4) cognitively able to complete the surveys. The study received approval from the institutional medical ethical review board of Guizhou Medical University (Approval number: GMU2022303).

Instruments

The EQ-5D-Y-3L assesses HRQoL with five dimensions (mobility; looking after myself; doing usual activities; having pain or discomfort; and feeling worried, sad, or unhappy) and three severity levels. Each health state in the EQ-5D-Y-3L can be summarized using level descriptors, generating 243 (35) unique states. The best state, 11,111, indicates ‘no problems’ in any dimension, while the worst state, 33,333, indicates ‘a lot of problems’ in all dimensions. An index score of 1.0 represent the value of full health, and a score of 0.0 the value of death. Negative values represent health states with values below the value of death. The collection of index scores for all possible states is called a ‘value set’. It includes a 20-cm visual analogue scale (EQ VAS) for overall health rating [20]. We used proxy version 1 in this study, involving caregivers providing their impression of the child’s health on the survey day [9].

The experimental version of EQ-TIPS, completed by the primary caregiver or parent, assesses six dimensions: movement; play; pain; social interaction; communication; and eating. Like the EQ-5D-Y-3L, each dimension has three severity levels, forming a 6-digit code with 729 (36) unique health states. The best state is 111,111. The EQ VAS is also included. In this study, the EQ-TIPS assessed HRQoL for children aged under four years old, as the EQ-5D-Y-3L proxy version is recommended for those aged four years and older [9].

The Chinese versions of EQ-5D-Y-3L and the experimental EQ-TIPS underwent translation per EuroQol Group guidelines [21]. Observations from previous surveys revealed a tendency for respondents to omit the impact of COVID-19, likely due to fluctuating conditions in many patients. Therefore, we proposed slight modifications to the instructions of the EQ-5D-Y-3L and the EQ-TIPS. Specifically, we added a short phrase before the original instructions as follows: (1) For the baseline survey completed by proxies: In comparison to the situation before the outbreak of the pandemic, please tick the ONE box that you think best describes the child’s health TODAY; (2) For the follow-up survey completed by proxies: In comparison to the situation during the outbreak of the pandemic, please tick the ONE box that you think best describes the child’s health TODAY; (3) For the self-completed version of EQ-5D-Y-3L: Taking into account the impact of the coronavirus pandemic, please tick the ONE box that you think best describes your health TODAY. The modification was approved by the EuroQol Research Foundation for use in the current study.

The Overall Health Assessment question (OHA), a valid measure of subjective health in children and adolescents [22], was phrased as ‘How is your overall health today? Is it excellent, good, fair, poor, or very poor?’ The proxy version gathered the caregiver’s impression of the patient’s overall health on the survey day.

The Chinese COVID-19 severity criteria were: (1) Mild: respiratory symptoms and fever; (2) Moderate: persistent high fever, cough, shortness of breath, with pneumonia imaging; (3) Severe: includes indicators such as high fever, tachypnoea, low oxygen saturation, respiratory distress, altered consciousness, and feeding difficulties [23].

Clinical recovery from COVID-19 was defined as having normal body temperature for over 3 days; mostly disappeared or significantly improved symptoms; significant absorption of pneumonia lesions on follow-up CT scan (if present); and either two consecutive negative RT-PCR tests, RT-PCR cycle threshold value ≥ 35, or three consecutive negative antigen tests [23].

Procedures

All consenting patients and parents independently completed the baseline survey using tablets in clinics or wards on the hospital admission day or during outpatient visits. Healthy children and adolescents, along with their parents, completed the survey at home using a smartphone. For children aged 4–18 years, parental carers provided sociodemographic information and completed the EQ-5D-Y-3L questionnaire (digital proxy version, including EQ-VAS), a five-point overall health assessment (OHA) question (proxy version), and questions on the parent’s demographics and the latest COVID-19 test result. For children under four years, the EQ-TIPS was used instead of the EQ-5D-Y-3L (Fig. 1).

The survey for children and adolescents aged over six years included the digital self-complete version of the EQ-5D-Y-3L, EQ VAS, and OHA question. Participants were invited to complete the same questionnaire during follow-up visits to outpatient clinics or on the day of discharge. Follow-up survey forms mirrored baseline forms, excluding demographic questions. On the survey day, the patient’s attending clinician completed the medical record including COVID-19 manifestations, disease duration, complications, severity per Chinese COVID-19 guidelines [23], and treatment.

Fig. 1
figure 1

Flow chart of the study from recruitment of children and their parent carers

Data analysis

We calculated descriptive statistics to summarize demographic, socioeconomic, and clinical characteristics. The construct validity, inter-rater agreement, and responsiveness of the EQ-TIPS and the EQ-5D-Y-3L dimensions and summary scores (EQ index score, level sum score and EQ VAS) were assessed.

For EQ-5D-Y-3L: the index score and EQ VAS were generated separately for self-complete (≥ six years) and proxy versions, using the Chinese EQ-5D-Y-3L value set [12]. The index score ranges from − 0.088 to 1, with higher values indicating better health utility.

For EQ-TIPS: as no preference-based scoring is available, a level sum score (LSS) was employed to summarize responses on the descriptive system. Numeric values ranged from 6 (no problems on all six dimensions: 1 + 1 + 1 + 1 + 1 + 1 = 6) to 18 (most severe score: level 3 on all dimensions: 3 + 3 + 3 + 3 + 3 + 3 = 18) [24].

We evaluated known-group validity by comparing summary scores across four health status categories at baseline: 1) with or without COVID-19; 2) three grades of disease severity of COVID-19 (mild, moderate, or severe); 3) presence of two or more symptoms versus none or one symptom. Appendix Table 1 provides a detailed breakdown of the symptoms observed in our study population, allowing for a clearer understanding of how symptom presence correlates with disease severity; and 4) ‘excellent’/’good’ versus ‘fair’/’poor’/’very poor’ oral health assessment (OHA). Our hypothesis predicted higher EQ-5D-Y-3L index scores and EQ VAS, as well as lower EQ-TIPS LSS, in ‘good’ health groups compared to ‘poor’ health groups. We used independent t-tests, and ANOVA for comparisons, with Cohen’s D effect size (ES = difference of mean/ pooled SD) indicating the relative efficiency in discriminating between patients with different health conditions [25]. Individual dimension-level distribution analysis employed Chi-square test, and Fisher’s exact test if any cell had expected count less than 5.

Table 1 Descriptive statistics of the sample (n = 1092)

Inter-rater reliability was evaluated between patients and carers on the EQ-5D-Y-3L at baseline. For dimensions, Gwet’s AC1 assessed agreement [26], with values categorized as: below 0.2 (poor), 0.21–0.4 (fair), 0.41–0.6 (moderate), 0.61–0.8 (good), and above 0.8 (excellent) [27]. The agreement on index score and EQ VAS used the intraclass correlation coefficient (ICC), with values classified as: below 0.1 (no agreement), 0.1–0.29 (low agreement), 0.3–0.49 (moderate agreement), 0.5 or higher (high agreement), and 0.7 or above (good reliability) [28].

Responsiveness was assessed in patients showing clinical recovery or OHA improvement from baseline to follow-up via independent t-tests to compare mean summary scores. Changes in ‘no problem’ proportions for each dimension were analysed. The results include the Glass’ Δ effect size (ES = difference of mean/ baseline SD), which is recommended when the intervention might influence the standard deviation [29]. The percentage of ‘no problem’ reported are detailed in Appendix-Table 3. All analyses utilized SPSS (IBM SPSS Statistics, Version 26.0, IBM Corp).

Results

Figure 1 illustrates the recruitment of this study involving 1092 children (0–18 years) and their parental caregivers. Among them, 78.8% were COVID-19 infected (average duration: 10.9 days), with 311 completing the follow-up survey after one to three weeks. The control group (21.2%) comprised non-infected children staying at home for at least three months. Baseline characteristics presented in Table 1 showed no significant differences except in residence. Most EQ-TIPS respondents were aged 2–4 years (63.5%), while the EQ-5D-Y-3L completers were mostly 6–11 years old (61.0%). Approximately 80% of patients had at least two symptoms, two-thirds had moderate to severe disease, and most caregivers were highly educated (92.1%).

Figure 2a shows that all EQ-TIPS dimensions contribute to lower scores, with parents of non-infected children reporting fewer problems than those with COVID-19 (p < 0.001), except for ‘communication’ (p = 0.110). The proportion reporting ‘no problems’ ranged from 51.3% for ‘pain’ to 74.8% for ‘communication’. Full health (111111) was reported by 30.4% of patients and 61.0% of non-infected children, indicating a higher ceiling effect in the healthier non-infected group across all dimensions as expected, ranging from 83.1% for ‘pain’ to 96.1% for ‘movement’.

Figure 2b indicates a significant trend of reporting more problems by infected children aged ≥ 4 years on the EQ-5D-Y-3L. In the self-complete version, the proportion of patients reporting ‘no problems’ ranged from 49.0% for ‘pain/discomfort’ to 73.9% for ‘looking after myself’. The proxy version showed a similar pattern with slightly more problems reported, particularly at the extreme level. Full health (11111) was reported by 38.4% of patients using the self-complete version, and by 56.1% of non-infected children. The proxy version percentages were 36.0% for patients and 53.5% for non-infected children. In patients aged 4–5 years, physical items (‘mobility’, ‘looking after myself’ and ‘usual activity’) reported less problems, with no significant difference compared to the non-infected group (p = 0.235, 0.119, and 0.109, respectively).

Fig. 2
figure 2

Dimension responses of the EQ-TIPS and the EQ-5D-Y-3L for children with or without COVID-19 infection in different age groups, i.e., aged 0–3 years, 4–5 years, and ≥ 6 years. P-value represent differences between with children COVID-19 and a healthy sample in terms of Chi-square test, and Fisher’s exact test if any cell had expected count less than 5. (a) Percentage of dimension responses for the EQ-TIPS for patients with COVID-19 and children without infection aged 0–3 years. (b) Percentage of item responses for the EQ-5D-Y-3L for patients with COVID-19 and children without infection aged ≥ four years

Table 2 presents the known-group validity of EQ-TIPS and EQ-5D-Y-3L summary scores. Those with poorer health status—COVID-19 infection, higher disease severity, multiple symptoms, or poorer OHA—showed higher LSS, lower index and EQ VAS scores. Statistically significant differences were observed between relevant groups (p < 0.05), except for EQ-TIPS LSS between mild and moderate severity (absolute difference = 0.01). Cohen’s D ESs were mostly moderate to high. For the EQ-TIPS LSS, between-groups ESs ranged from 0.58 to 0.84, and for the EQ-5D-Y-3L index, from 0.32 to 0.65 in those aged 4–18 years. ESs of the self-complete EQ-5D-Y-3L index score were larger than proxy version in ages ≥ 6 years (0.44 to 1.26 vs. 0.32 to 0.76). These differences in ESs were particularly evident when categorized based on disease severity and OHA. The greatest discriminative ability, with large effect sizes (0.76 to 1.54), was observed between OHA-defined groups, in all age groups. Satisfactory discriminative validity was shown between the COVID-19 and symptom-based groups (ESs: 0.50 to 0.84 and 0.60 to 0.76, respectively). The EQ-TIPS and the EQ-5D-Y-3L tended to show larger ESs between patients in the moderate and severe groups, compared to the differences between the mild and moderate groups.

Table 2 Known-groups validity of the EQ-TIPS LSS, the EQ-5D-Y-3L index score, and EQ VAS (mean [SD]) across different health condition based on with or without COVID-19 infection, disease severity, number of symptoms, and OHA using t-test or ANOVA

The EQ VAS exhibited moderate to high known-group validity for both EQ-TIPS and EQ-5D-Y-3L, with larger effect sizes observed in older age groups (0.17 to 1.39 for 0–3 years, 0.32 to 1.67 for 4–5 years, 0.44 to 1.91 for 6–18 years, respectively). Additionally, the self-complete version (0.51 to 1.91) showed higher ESs compared to the proxy version (0.44 to1.39).

Table 3 presents the inter-rater agreement on EQ-5D-Y-3L dimensions using data from 445 patient-proxy dyads with COVID-19 and a total of 559 child-parent dyads at baseline. For patients with COVID-19, the Gwet’s AC1 values ranged from 0.470 for ‘having pain/discomfort’ to 0.687 for ‘mobility’, demonstrating moderate to good inter-rater reliability for the descriptive system. The ICC values for index and EQ VAS were 0.657 and 0.815, respectively, indicating good inter-rater reliability for both. The overall sample exhibited similar and slightly better reliability, with Gwet’s AC1 ranging from 0.529 to 0.738 and 0.653 to 0.823 for ICC.

Table 3 The child-parent agreement of the self-complete and proxy versions of the EQ-5D-Y-3L at baseline in children ≥ 6 years and their parent carers (n = 559)

Table 4 shows strong responsiveness of the EQ-TIPS and the EQ-5D-Y-3L for both groups to health improvement based on clinical progress and enhanced Overall Health Assessment (OHA). The EQ-TIPS LSS showed ES of 1.21–1.39, and the EQ-5D-Y-3L index score had ES of 1.00–1.16 and 1.08–1.15 for the proxy and self-complete versions, respectively, in children and adolescents with improved health. The EQ VAS demonstrated the highest responsiveness, with SES ranging from 1.38 to 2.01 for proxy versions and 1.77 to 1.94 for self-complete version.

Table 4 Change in mean (SD) of EQ-TIPS LSS, EQ-5D-Y-3L index score and EQ VAS (with corresponding effect sizes) between illness and recovery based on clinical recovery or improved OHA

Discussion

In this study, we observed acceptable psychometric properties for the Chinese versions of the experimental EQ-TIPS, and for both modes of administration (self-complete and proxy) for the EQ-5D-Y-3L. These findings add to an expanding evidence base for the psychometric robustness of the EQ-5D-Y-3L and provide the first evidence of responsiveness for the EQ-TIPS, which is one of only a handful of preference-weighted HRQoL measures that can be used in the youngest populations. The lack of validated preference-weighted measures for infants and toddlers means that the EQ-TIPS is likely to be widely used, which highlights the importance of providing evidence to support its psychometric performance. Notably, large effect sizes on the EQ-5D-Y-3L were observed when using both clinical changes (disease severity and symptom numbers) and self-rated health changes (OHA) as external criteria for assessing responsiveness (independently of whether proxy or self-report was used), suggesting that the EQ-5D-Y-3L is useful in capturing improvements in health after recovery from COVID-19.

Our study, characterized by a large and diverse sample, broad age representation, and the unique ability to assess responsiveness through clinical recovery criteria, has several strengths. Notably, this is the first study in China to assess the psychometric performance of the experimental EQ-TIPS. Only three studies have been published previously exploring the EQ-TIPS’ measurement properties. While they demonstrated its validity, they provided limited evidence of reliability and none for responsiveness [18, 19, 30]. Our research therefore contributes the first evidence of the EQ-TIPS’ responsiveness, indicating its suitability for capturing COVID-19-related HRQoL improvements in infants and toddlers. Additionally, our study is the first to examine the psychometric properties of the EQ-5D-Y-3L in patients with COVID-19 using the corresponding Chinese value set.

The EQ-TIPS, applied to COVID-19 patients, showed a ceiling effect in the ‘communication’ dimension, echoing findings in a broader paediatric health study [30]. The non-significant difference suggests minimal impact on children’s communicative abilities. Respiratory effects might not significantly affect physical communication skills in young children. Our previous cognitive interviews revealed parental difficulty in responding to this dimension, emphasizing the need for simplified examples. This aligns with the study’s highest ceiling effect (74.8%) in ‘communication’ among COVID-19-infected children.

The EQ-TIPS LSS discriminated effectively between infected and non-infected groups and individuals with varying OHA, exhibiting significant effect sizes. It also discriminated well between those with moderate and severe disease, with an effect size of 0.58, but not between those with mild and moderate disease (ES of 0.004). It is not clear why this should be the case, but it is of note that the EQ VAS also provided poorer results in this youngest group (ES of 0.17) compared to any of the other age groups when examining its ability to discriminate between patients classified as mild and moderate. As an identical EQ VAS is used in the EQ-TIPS as in the EQ-5D-Y-3L, this finding suggests that the lack of discriminatory capacity between mild and moderate disease is not necessarily due to the instrument itself but rather to other factors. These factors could include difficulties that parents have in deciding on an ‘accurate’ score for their child’s health in such young children, who are unable to communicate how they are feeling, and/or questions about whether the criteria used to decide on disease severity are equally suitable across all age groups. Further research is required to clarify these issues.

This study offers the first evidence of the EQ-TIPS’ responsiveness, revealing significant improvement across all dimensions, as shown by the change in the percentage of patients reporting ‘no problems’ from the first to the second visit. Improvements were observed on all dimensions, suggesting that each dimension serves as a useful indicator of how HRQoL evolves as COVID-19 symptoms improve over time in very young children.

Our study supported the known-group validity, inter-rater reliability, and responsiveness of both self-complete and proxy versions of the EQ-5D-Y-3L, as evidenced by the performance of the index scores and EQ VAS. While previous studies validating the proxy version have mainly compared agreement levels between children and proxy-respondents [13,14,15, 31,32,33], our study delved into the comparison of validity and responsiveness, in particular, an area with limited exploration [34, 35].

In general, the response distribution patterns were similar between the self-complete and proxy versions, regardless of infection status. Our findings align with previous studies, suggesting parental underestimation of children’s HRQoL, particularly in COVID-19 or other infections [10, 16]. Specifically, parents reported more problems at level 3, which could be attributed to parents perceiving children’s symptoms as severe, while children might exhibit greater physical tolerance and cope better with the illness [36]. Additionally, no high ceiling effects were observed in any EQ-5D-Y-3L dimensions for children or adolescents with COVID-19, indicating the effectiveness of capturing these variations and identifying HRQoL-related problems or limitations.

The EQ-5D-Y-3L demonstrates moderate to good discriminative ability across age groups and health categories, with its strongest performance observed in OHA, consistent with its generic scale nature. Although the instrument excels in capturing variations in symptoms and infection status, its capacity to discern subtleties in disease severity may be constrained by its generic design, especially in scenarios where the distinction between ‘mild’ and ‘moderate’ criteria is not substantial. Nonetheless, the statistical significant differences in index values between mild, moderate, and severe categories suggested that the EQ-5D-Y-3L is a sensitive instrument in COVID-19 economic evaluations. This is particularly relevant for interventions such as vaccines, which can prevent poorer health states, or for treatments that improve health. Additionally, the substantial similarities between self-reports and proxy results suggest that results in trials will be relatively comparable, whichever source is used to collect data on the EQ-5D-Y-3L descriptive system. Although proxies tended to score lower than self-report, the difference between categories of disease severity or number of symptoms is quite similar, so gains or losses will be similar whether self-report or proxy reports are used. For instance, moving from severe to moderate disease severity represents a move from 0.57 to 0.74 using self-report, and 0.52 to 0.70 for proxy, an almost identical difference, suggesting use of one or the other response mode would have little impact on in the context of an economic model.

Our study indicated good inter-rater reliability, supporting the idea that self-report and proxy data are likely to be relatively comparable, and that aggregating them, for example for use in an economic model, is likely to be acceptable. In our study, the reliability of the EQ-5D-Y-3L index score was good, and excellent for EQ VAS indicated by ICC. However, compared to physical items, ‘having pain or discomfort’ and ‘feeling worried, sad, or unhappy’ showed poorer child-parent agreement, with parents reported more problems and lower index scores and EQ VAS. This aligns with previous studies which found lower agreement for emotional and mental items in paediatric populations with haematological malignancies, idiopathic scoliosis or general population and their parents [13, 15, 37]. The impact of COVID-19 has further highlighted discrepancies across all dimensions, potentially influenced by factors such as parental education, household income, and the infection status of other family members [16]. For example, in our study, 44.6% of proxy respondents of children with COVID-19, reported recent personal infection within the past week.

To our knowledge, this study provides the largest sample for assessing the responsiveness of the EQ-5D-Y-3L in children with COVID-19, especially given the ability to generate index scores based on a recently published value set. Our findings demonstrated good responsiveness to clinical recovery from COVID-19 and health improvements based on overall health assessments (OHA). The considerable effect sizes observed for the EQ-5D-Y-3L index or LSS scores, along with notably larger SESs for EQ VAS, underscore the EQ-5D-Y-3L’s effectiveness in measuring health improvements. Moreover, the effective performance of both the experimental EQ-TIPS and the EQ-5D-Y-3L in children with COVID-19 implies their potential applicability to other prevalent respiratory infectious diseases, which are widespread in many countries. Although our study did not delve into intervention analysis, our future research will explore how specific COVID-19 interventions impact children’s HRQoL. Additionally, investigating whether the new EQ-5D-Y-5 L performs as well or better in these patients than the EQ-5D-Y-3L would be of interest [38].

This study has several limitations. Firstly, the data were collected at an academic hospital in Shanghai, involving participants with a relatively higher socioeconomic status and parental education background, limiting generalizability. Secondly, the instruction section of the EQ-TIPS and the EQ-5D-Y-3L was slightly modified to emphasize the impact by the COVID-19 pandemic, which could potentially affect responses when compared to use of the standard instruction.

Conclusion

In conclusion, the study results show that the experimental EQ-TIPS and the EQ-5D-Y-3L are reliable and valid instruments for assessing the impact of COVID-19 on the HRQoL of children and adolescents, including very young children. Additionally, both instruments are responsive to change as children’s COVID-related health status evolves over time. The study also provides the first application of the new EQ-5D-Y-3L Chinese value set in a clinical population and shows that the values discriminate well between relevant disease groups. These instruments will therefore likely be useful in COVID-related clinical and resource allocation decision-making and in monitoring the well-being of infants, children and adolescents affected by COVID-19 and respiratory infections. Further research using these instruments to explore the impact of specific treatments for COVID-19 would be of interest.

Appendix

Appendix-Table 1 COVID-19 severity and symptom distribution based on symptom numbers at baseline
Appendix-Table 2 Known-group’s validity: dimension-level response distributions and “no problem” reported for individual dimension across disease severity
Appendix-Table 3 Responsiveness: change in percentage of respondent reporting “no problem” for each dimension between illness and clinical recovery or improved OHA