Background

Paediatric Patient-Reported Outcome Measures (PROMs) have become increasingly important in health outcomes research with an increase in use in clinical trials and evaluating health systems [1, 2]. Multi-attribute PROMs aim to capture the subjective constructs of health across physical, social and psychological functioning [3, 4]. There are broad categories of measures: disease-specific measures and generic. Disease-specific measures are typically developed to measure the effects of a specific disease or condition [5] and argued to be more responsive in that they detect disease-specific clinical changes [6]. Generic measures can be used in a wide variety of health conditions and the dimensions or items included apply to diverse conditions and populations [4, 6,7,8]. Thus, generic measures are able to compare health across different health conditions or populations. Generic measures thus have a wider application and can be used in population health surveys, burden of disease studies, epidemiological studies, screening, describing health status, developing management plans for individual patients, informing clinical policy and resource allocation decisions [6, 9,10,11,12,13,14].

There are currently over 35 published generic PROMs for children and adolescents younger than 18 years [1] of which the EQ-5D-Y and Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core scale have been frequently cited [1, 2, 15]. The EQ-5D-Y was adapted from the EQ-5D, an adult measure, to include youth friendly wording and examples [16]. Respondents, aged 8–15 years, can self-report their health, as experienced on that day, across five dimensions and a Visual Analogue Scale (VAS) measuring general health from 0 (worst health) to 100 (best health). The dimensions include mobility, self-care, usual activities, pain or discomfort and emotional state. The original three-level version, EQ-5D-Y-3L (Y-3L), records scores on three levels of severity: no problems, some problems or a lot of problems [16]. The levels of report have recently been expanded to five on the EQ-5D-Y-5L (Y-5L): no/not, a little bit, some/quiet, a lot/really or cannot/extreme(ly) [17]. The increase in levels from three to five levels has been shown to improve the discriminatory power and reduce the ceiling effect of the measure [18, 19].

The PedsQL aims to measure the core dimensions of health, as described by the World Health Organisation (WHO), physical functioning, emotional functioning, social functioning and an additional item of school functioning [20]. The PedsQL has multiple age versions available with questions relevant to the development of the child. The measures for children (aged 8–12 years) and adolescents (aged 13–18 years) both include 23 self-reported items which assess the frequency of problems (never, almost never, sometimes, often, almost always) in the past one month. Although there is overlap between the Y-3L and Y-5L dimensions and the PedsQL items the most notable difference is the inclusion of the school functioning questions on the PedsQL descriptive system. Although ‘going to school’ is included as an example on the Y-3L and Y-5L descriptive system there are no specific questions on academic performance. The expanded Y-5L, with increased level of report, is now more similar to the PedsQL version on this attribute and may show improved association with the PedsQL compared to the Y-3L. The aim of this study was to compare the feasibility, convergent validity, concurrent validity and known group validity of the Y-5L, Y-3L, PedsQL and Self-Rated Health (SRH) question.

Methods

Study design and participants

An observational, analytical cohort study was conducted with the Y-5L, Y-3L, PedsQL and SRH question. A head-to-head comparison of the Y-5L and Y-3L instrument performance is presented elsewhere [18, 21].

Three research settings, each with children/adolescents in different health states, were used in Cape Town, South Africa. Although details of socio-economic status were not captured children living in the same geographical area were recruited ensuring that they were from similar socio-economic backgrounds (low to middle income).

Children/adolescents attending two mainstream schools, which admit generally healthy learners without special education needs, were used to recruit a general population sample. Children/adolescents with stable chronic health conditions were recruited from five schools for learners with special education needs. These schools have specialised education services for learners with normal intellect diagnosed with physical disability and/or learning disability. Children/adolescents requiring acute medical treatment were recruited from the inpatient wards of an acute tertiary paediatric hospital and a paediatric orthopaedic hospital.

All children/adolescents aged 8–15 years, who were able to read and write English, the most commonly spoken and written language in South Africa [22], at each facility were eligible for the study. Only those who returned a signed informed consent and assent were included in the study and those who were critically ill or who were medically unstable were excluded as the research may have been too distressing. The sample size was adequately powered (95%) to detect a difference in correlation of scores between the three condition groups with a small effect size 0.4 and a significance of 0.05.

Instruments

EQ-5D-Y

The official Y-3L English version for South Africa was used in this study. The experimental Y-5L English version for the United Kingdom was tested for equivalence in English for South Africa by the EuroQol group [23]. Each version consists of five dimensions namely Mobility (walking about), Looking After Myself (washing and dressing), Usual Activities (going to school, hobbies, sports, playing, doing things with family or friends), Pain or Discomfort and Worried, Sad, or Unhappy. There is also a general rating of health on a VAS of 0 (worst health) to 100 (best health). The original youth version, Y-3L, describes health on three levels (no problems, some problems and a lot of problems) resulting in 243 (35) health states [16, 24]. The newly expanded version, Y-5L, describes health on five levels [no/not, a little bit, some/quiet, a lot/really, cannot/extreme(ly)] resulting in 3125 (55) health states.

The three or five levels of the descriptive system are expressed with a five-digit code. For example, the Y-3L health state 11223 describes someone with no problems with Mobility, no problems with Looking After Myself, some problems with Usual Activities, some Pain or Discomfort and very Worried, Sad or Unhappy. The best health state described by the instrument is coded as 11111, describing ‘no problems’ in each of the dimensions [23]. Although the Y-3L has a preference-based score the Y-5L does not [25,26,27]. As such a level sum score (LSS) was used to describe the responses on the descriptive system where the level labels are treated as numeric data with the best possible score (1 + 1 + 1 + 1 + 1) = 5 and the most severe score for the EQ-5D-Y-3L is (3 + 3 + 3 + 3 + 3) = 15. The other health states will have a LSS ranging between 5 and 15, with a larger score indicating a worse health state. Y-5L is similarly scored with a LSS ranging between 5 and 25 [28]. The LSS is a crude score which does not account for preference of dimensions or weighting of responses [29, 30] but gives some indication of the performance of the dimensions between the Y-3L and Y-5L. Results from Y-3L value sets show that there is a difference in rank order of dimensions and scores attributed to dimensions when compared to the adult EQ-5D-3L [25,26,27] as such comparing LSS may give a better indication of performance of the Y-3L compared to the Y-5L than using the adult EQ-5D-3L and EQ-5D-5L value sets. The Y-5L VAS was reported for this study.

Pediatric Quality of Life Inventory (PedsQL)

The 23 item PedsQL 4.0 Generic Core Scales for children aged 8–12 years and 13–18 years were used as appropriate [31]. Both age versions of the PedsQL consist of four dimensions of functioning: physical, emotional, social, and school with 8,5,5 and 5 items respectively. Each item is scored on a Likert scale from 0 to 4 (never a problem, almost never, sometimes, often, or almost always a problem). Items are reversed scored and transformed to a 0–100 scale: 0 = 100, 1 = 75, 2 = 50, 3 = 25, 4 = 0. Dimension scores are calculated by a sum of the item scores divided by the total number of items. A total score is similarly generated by summing the dimension scores over the total number of dimensions giving an overall Health Related Quality of Life (HRQoL) score. Scores for scales with more than 50% missing data are not computed. A higher PedsQL score indicates a better HRQoL [32,33,34].

Self-Rated Health (SRH)

The Self-Rated Health (SRH) question asks the child to describe their general health today as: ‘excellent’, ‘very good’, ‘good’, ‘fair’ or ‘poor’. This question has been shown to be a valid measure of subjective health in children and adolescents [35]. The items were scored numerically for data analysis with excellent scored 5 and poor scored 1. The SRH question is expected to capture general health similarly to the EQ-5D-Y VAS [36, 37].

Procedure

Ethics approval was obtained from the University of Cape Town, Faculty of Health Sciences, Human Research Ethics Committee (HREC 154_2019). The study was carried out in accordance with the declaration of Helsinki involving human participants [38] and the recommended Covid precautions.

Children/adolescents aged 8–15 years admitted to either of the acute inpatient hospital settings were recruited during an onsite visit. For those who were willing and provided consent and assent the parent was asked to complete the socio-demographic information for the child and the children/adolescents were asked to self-complete the Y-5L, PedsQL, SRH and Y-3L in that order. The Y-5L was presented first based on the adult study comparing the EQ-5D-5L and EQ-5D-3L version as it was found that if the EQ-5D-3L was presented first the additional levels on the EQ-5D-5L were not considered [39]. Children and adolescents recruited at one of the hospitals completed the questionnaires in a quiet, private space with supervision from the researcher.

Due to the constraints of the Covid pandemic children and adolescents attending either the mainstream schools or schools for learners with special education needs were recruited through information leaflets that were sent home to them and their parents. For those who were willing and provided consent and assent the instruments were self-completed by the child/adolescent at home under the supervision of their parent. The accompanying information clearly stated that parents should not assist or influence with the completion of the instruments. A reminder was sent out to learners and parents who had not responded after one and two weeks.

Data management and analysis

General performance and feasibility

The Y-5L, Y-3L, PedsQL and SRH responses and descriptive data were summarised in terms of frequency of responses. The feasibility was assessed by comparing the number of missing values across measures.

Concurrent validity

The concurrent validity of the dimension scores of the Y-3L and Y-5L were compared to the individual PedsQL items and sub-scale scores using Spearman correlations (rs).

It was anticipated that Y-5L/Y-3L Mobility dimension would be associated with PedsQL items of hard to walk; 100 m, hard to run and Physical Health Summary Score. Y-5L/Y-3L Looking After Myself dimension would be associated with PedsQL hard to bath/shower. Y-5L/Y-3L Usual Activities would be associated with participate in sport/exercise, household chores, miss school because not feeling well, miss school to go to the doctor and Y-3L/Y-5L Worried, Sad or Unhappy would be associated with items of Sad and Worry. PedsQL summary and total scores were compared to EQ-5D-Y LSS and VAS and scores and SRH scores with the Pearson’s correlation co-efficient. Correlation coefficients were interpreted according to Cohen: 0.1–0.29 low association, 0.3–0.49 moderate association and ≥ 0.5 high association [40].

Known-group validity

Children with health conditions receiving acute or chronic health care and those from the general population were compared for known-group validity. Analysis of variance (ANOVA) with Tukey post hoc analysis was used to compare the Y-5L and Y-3LL LSS and VAS scores, PedsQL sub-scales, summary and total scores and the SRH score (which was treated as a scale variable for this analysis).

All data analyses were conducted using SPSS Windows 27.0 (IBM SPSS Inc., Chicago, IL, USA) and Statistica Windows Version 13.0 (TIBCO Software Inc., Palo Alto, CA, USA).

Results

Figure 1 summarises the recruitment and enrolment of participants. The data of 550 children/adolescents has been included for analysis. Reasons for refusal of consent/assent was not collected.

Fig. 1
figure 1

Recruitment into the study

The mean (SD) age of the children/adolescents across the age groups was 11.3 (1.6) years range 8–15 years. The children/adolescents with chronic conditions were older [mean age 12.2 years (SD 1.9 years)] than those in the acute medical setting or from the general population [mean age 11.3 years (SD 1.6 years)] (F = 13.08; p < 0.001). Sex of participants was equally distributed for the overall sample however those with chronic conditions had a higher proportion of males (62%) compared to the general population (40%) (X2 = 20.30; p < 0.001). Most of the children/adolescents needing acute medical management were receiving orthopaedic management whereas those with chronic conditions had a physical disability or learning difficulty (Table 1). The children/adolescents in the general population group had minor health conditions including asthma, eczema and allergy and other conditions including epilepsy, diabetes, glaucoma, and a cardiac lesion.

Table 1 Descriptive statistics of the sample

General instrument performance and feasibility

Table 2 summarises the frequency of responses across the four measures.

Table 2 Frequency of responses for the EQ-5D-Y-5L, EQ-5D-Y-3L, Self-Rated Health (SRH) score and PedsQL

The Y-5L had a low number of missing scores (0–2%) for individual dimensions with a total of 3.5% of missing responses across all five dimensions and 5% missing for dimension and VAS responses. The Y-3L had an even lower number of missing scores (0–1%) for individual dimensions and only 1% across all dimensions. There were only three children/adolescents with missing responses on the Y-3L dimensions compared to 21 on the Y-5L. Five percent of the participants did not complete the second VAS for the Y-3L. Ten respondents did not complete the SRH question resulting in 2% missingness.

A total of 43 children/adolescents had missing responses on the PedsQL with 16 of them not completing any items. The 16 children who did not complete the measure contributed to a large percentage of missingness thus the missingness in those who did complete the measure (n = 534) resulted in a total of 19% missing responses across the 23 items. The 16 children who did not complete the measure were all from the chronically ill or general population sample and completed the questionnaire at home, there were no other relevant demographic factors for this group. The number of missing responses at an item level for those who did complete the questionnaire ranged from 0 to 1%. The missing items were highest for the 8 items of physical health with 9% missingness compared to 4%, 2% and 4% for the 5 items in the emotional, social, and school sub-scales respectively.

Concurrent validity

Table 3 shows that the Y-3L and Y-5L had similar high association with similar items on the PedsQL generic measure and moderate associations with related items. Table 4 shows the concurrent validity of the Y-3L and Y-5L VAS and LSS with the PedsQL and SRH scores. All Y-3L and Y-5L scores were moderately associated with the SRH score across condition groups. The PedsQL total score was moderately and significantly associated with the Y-3L/Y-5L LSS and VAS scores across all condition groups, except for the VAS score in the group of children with acute illness. The Y-3L and Y-5L LSS, physical and psychosocial summary scores had significant moderate associations for children/adolescents with a stable chronic condition and for the general population. The emotional and social sub-scale scores showed greater association with the Y-5L and Y-3L scores than the school sub-scale. The Y-3L and Y-5L VAS and LSS showed greater association with the SRH than the PedsQL for children with a chronic condition and from the general population. The PedsQL had a weak and non-significant correlation with the SRH score for children with an acute condition whereas the Y-3L and Y-5L both showed moderate significant correlations.

Table 3 Spearman correlation of EQ-5D-Y-5L and EQ-5D-Y-3L dimension scores and PedsQL item and sub-scale scores
Table 4 Summary of concurrent validity with the PedsQL summary and sub-scores, Self-Rated Health and EQ-5D-Y-3L and EQ = 5D-Y-5L LSS and VAS scores with Pearson correlation

Known-group validity

Table 5 presents the known-group validity of those with acute or chronic health conditions and the general population. The Y-5L (LSS and VAS), Y-3L (LSS and VAS) and PedsQL Physical Health Summary Score were able to detect significant differences between health groups. The PedsQL Psychosocial Health Summary Score and the PedsQL Total scores were able to detect differences between the general population and ill health, but not between acute and chronic groups.

Table 5 Comparison of known-group validity for health condition across the EQ-5D-Y-5L, EQ-5D-Y-3L, PedsQL and Self-Rated Health (SRH) question

Discussion

The aim of this study was to investigate the feasibility, concurrent validity and known group validity of the Y-3L, Y-5L, PedsQL and SRH.

To our knowledge, this was the first study to compare the Y-3L and Y-5L to the PedsQL generic measure and SRH. Children/adolescents receiving acute medical management and with a stable chronic condition were considered a suitable population for comparison of the measures as there was a spread of disease severity, and consequently a spread of scores. Furthermore, the Y-3L and PedsQL [41,42,43,44] and Y-3L and SRH [37, 45] have been compared in previous studies.

The number of missing responses was higher on the longer PedsQL measure (18%) than the shorter measures of Y-5L (3.5%), Y-3L (1%) or SRH (2%), when the respondents who did not complete any PedsQL items were excluded. This is further evident from the number of respondents with missing responses with 27 on the PedsQL, 21 on the Y-5L, 10 on the SRH and 3 on the Y-3L. This highlights that it is potentially the number of items and the number of responses which contributes to the missing data. Reasons for missing data was not recorded and cannot be commented on.

The distribution of responses between Y-3L and Y-5L and PedsQL are difficult to compare. The Y-3L/Y-5L asks the respondents to rate dimensions on a severity scale for today whereas the PedsQL asks about the frequency of problems for the last month. Although there is overlap in items/dimensions the reporting of them is different and as such the distribution will differ across the five/three levels of report. Despite these differences in the descriptive systems there were high correlations between Y-3L/Y-5L dimensions and similar items on the PedsQL. There was further moderate correlation with related items e.g. mobility also showed correlations with “hard to bath/shower”, “doing household chores”, “low energy”, “hard to keep up with others” etc. Although it was postulated that the Y-5L would show higher association with the PedsQL than the Y-3L there was no systematic difference between them. Most of the other studies compared Y-3L dimension scores to PedsQL sub-scores or total scores [41, 44, 45] except Scalone et al. [43] who found a low to moderate correlation on similar items on a large sample of general population and small number of ill children. Comparison of Y-3L dimensions and PedsQL sub-scores reported moderate correlation in general population samples [44] and those with health conditions [36, 41]. Both the Y-3L and the PedsQL were used in children with acute Thalassemia with acceptable reliability on Cronbach’s alpha but no comparison of performance between the two measures was made [46]. Similarly, the EQ-5D-Y-5L and PedsQL both showed a decrement in HRQoL with sleep deprivation, physical activity and screen time in a large general population sample in Hong Kong but no comparison between instrument performance was made [47].

Although the EQ-5D-Y descriptive system does not explicitly include school function there were moderate correlations between PedsQL items of “missing school because of not feeling well”, “missing school to go the doctor and hospital”, the school sub-score and the Y-3L and Y-5L dimension of Usual Activities. There was low but significant with Usual Activities and the PedsQL item of ‘trouble keeping up with schoolwork.” Furthermore the LSS of the Y-3L and Y-5L showed moderate correlations showing that EQ-5D-Y dimensions do in fact capture school functioning.

The PedsQL Total Score had a greater correlation with the Y-3L/Y-5L LSS than the VAS scores which was to be expected as the VAS measures general health and may include other influencers of health which are not captured on the dimensions or items of the corresponding measures. The association between Y-3L/Y-5L LSS was greater for those with chronic conditions or from the general population. Although the Physical Health scores showed moderate to strong correlations for children across health groups the Psychosocial scores showed a weak to moderate correlation, with weaker correlations in those with an acute or chronic condition compared to the general population. This cannot be compared to other studies as they did not look at comparison of instruments across health groups but rather for the entire sample [36, 41, 43, 44]. This could be attributed to the recall period of Today versus that of the past one month. The EQ-5D-Y is more sensitive in capturing acute health problems or those with stable health condition (such as with a stable chronic illness) but may miss those with fluctuating health over time [48]. As such the PedsQL condition specific acute or chronic measures may be more appropriate to use with the disadvantage that you cannot directly compare different health conditions [9]. Despite this the Y-5L and Y-3L LSS were able to differentiate between health conditions (acute, chronic and general population). Whereas the PedsQL showed no difference between those with acute or chronic illness. This was similarly found in the Psychosocial Health Summary Score, however the Physical Health Summary Score showed significant differences between the three groups.

The SRH Question showed stronger correlation to the VAS across all condition groups which was similarly reported in a Swedish study with the Y-3L VAS [37]. The correlation was stronger in the general population group indicating that the VAS may be more sensitive in detecting differences in general health than the five levels of the SRH item. This is confirmed in this study in that VAS was significantly different between those with different health conditions but there was no difference between those with acute and chronic health problems on the SRH question.

Due to the limitations of the Covid pandemic on recruitment of children/adolescents at schools there may be non-response bias. Although parents and children/adolescents were explicitly instructed to complete the measures on their own without influence from others there was no way to ensure this in the sample with a chronic condition and general population. The Correlation of the Y-3L, Y-5L, PedsQL and SRH may have been influenced by the relatively high ceiling effect, most notably for the General population sample. The non-randomised order of the questionnaires could be influencing the results and contributing to an order effect. As the Y-5L is a newly developed measure it has first been tested in English, the source language, in South Africa before translation and adaptation into the other eleven South African languages. Although English is widely spoken and written in South Africa inclusion of only English questionnaires could have excluded parts of the population who speak or write in one of the other ten official languages of South Africa. No details on ethnicity were collected on which to judge the generalisability of the sample however, no one was excluded based on gender, race or religion.

Conclusion

The results of this study show that the Y-3L and Y-5L showed comparable psychometric validity to the PedsQL. When considering the choice between the PedsQL, Y-5L and Y-3L these study results indicate that theEQ-5D-Y instruments (Y-3L and Y-5L) are recommended for studies assessing known-group validity, particularly between children with and without illness, or where missing data should be minimised. The PedsQL generic measure does not capture acute health problems but shows improved performance specifically with a lower ceiling effect and spread of responses in those with a more stable health condition. As such the PedsQL generic measure may be preferable in future studies including the general population. Distribution of responses across the Y-3L, Y-5L and PedsQL cannot be compared and comparison of the Y-3L to the Y-5L was not the aim of this paper.

When considering the choice between the Y-5L and the Y-3L there was no systematic difference in the validity between these instruments or between the Y-5L or Y-3L and the PedsQL. Thus, the selection of EQ-5D-Y measure for future studies should be guided by the characteristics of the population to be included, for example the Y-5L may be preferred in a population where a higher ceiling effect is anticipated. In contrast the Y-3L may be preferred in a study which includes a younger cohort or where a lower literacy level is anticipated. Further research is recommended comparing the performance of the PedsQL and the Y-3L and Y-5L in homogenous disease groups to guide future researchers on the selection of the most appropriate instrument.