Introduction

Juvenile idiopathic arthritis (JIA) encompasses all forms of arthritis with unknown causes starting before 16 years old [1, 2]. It is the most common chronic rheumatic disease in children [1, 2], affecting approximately 1 per 1,000 Canadian children and 0.07–4.01 per 1,000 children worldwide [3, 4]. Different subtypes vary in clinical presentation, pathogenesis, and prognosis; however, they have common symptoms, such as morning joint stiffness, joint pain, and joint swelling [5]. If untreated, JIA can lead to joint damage, functional limitation, and severe disability [6]. There is no cure, and treatment mainly manages pain and inflammation through a combination of pharmacological, physical, and psychosocial therapies to control symptoms and avoid joint damage [4, 7]. Among pharmaceutical treatments, the relatively-new biologic disease-modifying antirheumatic drugs, commonly referred to as biologics, have shown high efficacy for disease remission and preventing long-term disability, however, they have side effects, high costs, and unknown long-term safety [7, 8].

Patient-reported outcome measures (PROMs) are used to assess functional status, pain, and health-related quality of life (HRQL) for JIA patients in clinical settings. They provide an inclusive picture of health status and help clinicians understand the impact of JIA on children’s everyday life and the life course [9,10,11]. Measuring PROMs reflects whether clinical care makes patients feel better [12, 13]. The commonly used JIA-specific PROMs, such as the Childhood Health Assessment Questionnaire (CHAQ), incorporate outcomes of patients’ direct interest, and can provide an effective measurement of health status and predict patient outcomes [12, 14, 15].

Several generic pediatric HRQL measures including the EQ-5D-Y-3L have been validated and used among children with JIA [16]. Valuation studies on the EQ-5D-Y-3L, published or underway, support the use of EQ-5D-Y-3L in future economic evaluation in JIA, which is important given the high costs of JIA treatments [16, 17]. The EQ-5D-Y-3L has only 3 levels in each dimension, potentially leading to large ceiling effects [18]. The EQ-5D-Y-5L, still experimental, is a 5-level version revised from the EQ-5D-Y-3L that aims to reduce ceiling effects and enhance sensitivity [19]. It has been validated in several disease areas and the general population, mainly among children aged 8 years old or more, via the self-report or the caregiver-proxy (reported by a caregiver on behalf of the children) version, and found to have less ceiling effects, better responsiveness and discriminate power (versus the EQ-5D-Y-3L), good test-retest reliability, convergent validity, and known-group validity [20,21,22,23,24,25,26,27]. Among studies exploring the psychometric properties of the caregiver-proxy version, good agreement between the proxy and the self-report version was identified [25, 26]. There remains a knowledge gap in the performance of EQ-5D-Y-5L among children under 8 years old, and the psychometric properties of the EQ-5D-Y-5L among patients with JIA of all ages.

As young children have difficulty completing the PROM by themselves, a proxy version is required. For both the CHAQ and EQ-5D-Y, the age threshold for mandatory proxy-version is 8 years [28, 29]. Many JIA patients have the disease onset at a very young age (2–4 years) [30]. Valid tools are necessary to assess their HRQL before age 8.

This paper focuses on assessing the psychometric properties of the parent-proxy version of the EQ-5D-Y-5L among patients with JIA in terms of its ceiling effect, construct validity (convergent validity, divergent validity, and known-group validity), and informativity.

Methods

Data

This validation study used data collected from the Understanding Childhood Arthritis Network (UCAN) Canadian-Dutch collaboration (CAN-DU) [31] study cohort between October 2019 and May 2023. Three cohorts were analyzed in this study, patients with new-onset JIA, patients starting or restarting biologics, and patients stopping biologics. Depending on disease progression, each study participant could be in several cohorts throughout the study period. In each cohort, patients have a baseline clinical visit and at least one follow-up visit. Clinical data, and the parent-proxy version of the CHAQ and the EQ-5D-Y-5L were collected during each visit. The analytical sample for this cross-sectional validation comprised data from all baseline visits where EQ-5D-Y-5L data was collected within a 30-day window. For clinical data and CHAQ with missing values, we perform analyses pertaining a specific variable with the complete dataset for that variable and report the extent of missing values. Ethics approval was granted by the Conjoint Health Research Ethics Board at the University of Calgary (REB17–1563), the Health Research Ethics Board at the University of Alberta (Pro00106423), and the Ethical Board of Utrecht (18–474).

Measures

Clinical measures

Patient age, biological sex, country of residence (Canada, Netherlands), cohortFootnote 1, JIA subtypesFootnote 2, disease duration at the time of visit (up to 12 months before this visit, more than 12 months before this visit) were collected. We used several disease severity and disease activity measures including disease status assessed by physicians (active, inactive), active joint count, presence of morning joint stiffness, presence of joint pain, physician global assessment of disease activity (10-point VAS), and disease activity (inactive disease, minimal disease activity, moderate disease activity, and high disease activity) assessed by the clinical juvenile arthritis disease activity score-10 (cJADAS10, a composite disease activity score for JIA calculated from physician’s global rating of overall disease activity, parent/child ratings of well-being, and count of active joints) [32, 33].

EQ-5D-Y-5L parent-proxy version

The EQ-5D-Y-5L [19] is a generic HRQL measure for the young population. The descriptive system measures the health status of “today” through five dimensions including mobility, looking after myself, doing usual activities, having pain or discomfort, and feeling worried, sad, or unhappy and a vertical visual analog scale (VAS) ranging from 0 (the worst health) to 100 (the best health). The EQ-5D-Y-5L has five levels: 1 “no problems”, 2 “a little bit of problems”, 3 “some problems”, 4 “a lot of problems” and 5 “extreme problems/cannot do” [34]. There is currently no value set for the EQ-5D-Y-5L to summarize the five dimensions to a single preference-based index. An alternative method to summarize the descriptive system is by calculating a level summary score (ranging from 5 to 25), which is a total sum score of the severity levels of each dimension [35, 36].

There is no current official recommendation regarding the user age range in the administration of the EQ-5D-Y-5L. For EQ-5D-Y-3L, it is not applicable to children aged 0–3 years old, and a proxy version should be used for children aged 4–7 years old. As our study population includes children aged 4–7 years, we used the parent-proxy version of EQ-5D-Y-5L and CHAQ for all ages for this paper.

CHAQ parent-proxy version

The CHAQ [37] is a validated measure for children with JIA. It is designed to measure health status “over the past week”. The CHAQ includes a disability section, a discomfort section, and an overall assessment of the health status affected by arthritis. The disability section has eight dimensions: dressing and grooming, arising, eating, walking, hygiene, reach, grip, and activities. The dimensional score is determined by selecting the highest score among all items within each dimension. The item score ranges from 0 to 3: 0 “no difficulty”, 1 “some difficulty”, 2 “much difficulty”, 3 “unable to do”. If the use of aids or devices is reported for items within a particular dimension, the minimum score would be 2. The average score of all eight dimensions is the disability index (0–3, higher score reflecting greater disability). In the discomfort section, a horizontal VAS (0-100) is used to measure the presence of pain and its severity, with 0 representing “no pain” and 100 representing “very severe pain”. In the overall health status section, a horizontal VAS is used, with 0 representing “very well” and 100 representing “very poor”.

Measurement properties

We performed descriptive analyses to describe the distribution of the EQ-5D-Y-5L. We estimated the mean (standard deviation, SD) and median (interquartile range, IQR) of the level summary score and explored the distribution of responses to each dimension. We analyzed the ceiling effect (assessed by the proportion of respondents reporting “no problems” in all dimensions) and floor effect (the proportion of respondents reporting “extreme problems/cannot do” in all dimensions). No ceiling/floor effect of the EQ-5D-Y-5L reflects less responses clustered on the upper (the best health) or lower (the worst health) bounds of the measurement scale, indicating ability to measure a wide range of values. A threshold of 15% was used to determine whether ceiling/floor effect was present [38].

Construct validity reflects the degree to which a measure really evaluates the construct to be measured [39]. We assessed the construct validity of the EQ-5D-Y-5L by exploring its convergent validity, divergent validity, and known-group validity. The convergent validity reflects that constructs theoretically the same or similar are in fact highly correlated with each other, while the divergent validity demonstrates that constructs theoretically not similar are in fact not highly correlated with each other [40, 41]. Demonstrative figures on the theoretical relationship between the dimensions of EQ-5D-Y-5L and the CHAQ were provided in Appendix 1. We used Spearman’s rank correlation (Spearman’s r) to explore the association between the EQ-5D-Y-5L and CHAQ dimensions. The strength of the correlation was interpreted using the following criteria: no (r < 0.2), weak (0.2 ≤ r < 0.35), moderate (0.35 ≤ r < 0.5), and strong (r ≥ 0.5) [42].

Known-group validity reflects whether a measure can distinguish between two or more groups that are known to differ in the variable of interest [43]. Known-group validity can be considered present when at least 75% of the results are in correspondence with the hypotheses, in groups of at least 50 patients [38]. Studies have found that suboptimal HRQL among patients with JIA was associated with non-oligoarthritis [37, 44,45,46], shorter disease duration [45, 47], severe disease activity [45, 48,49,50], higher level of pain [45, 48, 49], severe functional disability [45, 48, 49], and poor well-being [48, 49]. Based on these studies, the formation of known groups with our study data was presented in Appendix 2. We used the one-way analysis of variance (ANOVA) to test the mean difference in EQ-5D-Y-5L level summary score across each known group. When the known group has three or more levels, each level was compared with the one subsequently less severeFootnote 3, and a Bonferroni correction was used in these post-hoc pairwise comparisons. In total, 17 hypotheses across 11 known-group variables were examined with the statistical significance level being p < 0.05; the above noted 75% threshold can be applied to this to draw conclusions about known group validity. We also estimated the effect sizes (Cohen’s d) to quantify the magnitude of mean difference across each known group. The effect size was estimated by the mean difference divided by the pooled standard deviation. The magnitude of the effect size was interpreted as: 0.2 to 0.49, small; 0.5 to 0.79, moderate; and ≥ 0.8, large [51].

Post-hoc power analysis indicated sufficient power to detect all effect sizes of interest (i.e., r = 0.35 for correlations between EQ-5D-Y-5L and CHAQ for construct validity and effect size (d) = 0.5 for known groups differences).

Informativity reflects the ability of a measure or item to discriminate between people with different characteristics measured by that item [52]. According to information theory, the informativity of an item is better when it has more categories and responses to this item are more evenly distributed among categories [53]. Shannon’s absolute index (H’) can express the extent to which the information is evenly distributed across categories, and Shannon’s evenness index (J’, defined by H’/H’max, i.e., the use of the system (H’) given its potential (H’max)) can express the relative evenness of the distribution, regardless of categories [53, 54]. Shannon’s J’ ranges from 0 to 1, with 1 indicating the item is perfectly even.

Sensitivity analysis

Although the EQ-5D-Y-5L parent-proxy version is not recommended for children aged under 4 years old, this study recruited and administered the instrument with participants younger than 4 years. In the primary analysis, we analyzed the psychometric properties among children greater than 4 years old, and in the sensitivity analysis, we re-ran the analysis including children younger than 4.

Results

Patient characteristics

In the base case, 407 patients were included, representing 467 patient visits. At the entry to the UCAN CAN-DU study, patients had a median age of 12 years (interquartile range (IQR): 9–15), with 58% being female and 34% residing in Canada. Of all patient visits, 17% came from patients aged 4–7 years old; 45% were in the “new onset of JIA” cohort and 40% in the “starting biologics” cohort; 40% were characterized by the three oligoarticular subtypes and 25% by the two polyarticular subtypes; 41% had disease duration more than one year; 83% were characterized by an active disease status; 30% had the presence of morning joint stiffness (≥ 15 min); 31% had the presence of joint pain; and 59% were classified as moderate (39%) to severe (20%) disease activity using cJADAS10 scores. The median number of active joint counts was 2 (IQR: 1–5), and the median physician global assessment disease activity score was 2.5 (IQR: 1.3-4.0), with 0 indicating no activity. Among these variables, JIA subtypes (9%), disease duration (10%), active joint counts (2%), and cJADAS10 (23%) had missing values, with cJADAS10 having the highest missing rate (Table 1). Given that cJADAS10 measures disease activity, we used the binary disease activity variable to examine the characteristics of missing data, and we found that patients with inactive disease were more likely to have missing values in cJADAS10 (data not shown).

Table 1 Patient characteristics

Response pattern

The mean EQ-5D-Y-5L level summary score was 9.81 (SD = 3.81) and the median score was 9 (IQR: 7–12). The EQ-VAS score had an estimated mean of 70.67 (SD = 20.78) and a median of 74 (58–88). The percentage of responses reporting the best or worst possible EQ-5D-Y-5L profiles was 13% and 0%, respectively, suggesting no ceiling or floor effect effects (Table 2). The “having pain or discomfort” dimension had the most problems reported (79%), followed by “doing usual activities” (66%), the “feeling worried, sad, or unhappy” (60%), and the “mobility” dimension (56%). There were fewer problems in the “looking after myself” dimension (28%) compared to other dimensions (Fig. 1).

Table 2 Descriptive statistics of EQ-5D-Y-5L and CHAQ
Fig. 1
figure 1

EQ-5D-Y-5L parent-proxy version and Childhood Health Assessment Questionnaire (CHAQ) parent-proxy version dimensional responses (Base case: 467 patient visits representing 407 patients)

The CHAQ disability index had a mean and median of 0.68 (SD = 0.62) and 0.5 (IQR: 0.1-1), respectively, and was strongly correlated with the EQ-5D-Y-5L level summary score (r = 0.77). Of all dimensions, patients reported greater disability to perform tasks in the “activities” dimension, with only 33% reporting no difficulty. The “dressing and grooming”, “arising”, “walking”, and “reach” dimensions had higher proportions reporting no difficulty (49–56%), with the “eating”, “hygiene”, and “grip” dimensions having the highest proportions reporting no difficulty (61-68%). The CHAQ pain index had an estimated mean of 41.44 (SD = 28.95) and median of 41 (IQR: 13–68), and the CHAQ health status index had an estimated mean of 37.77 (SD = 28.73) and median of 37 (IQR: 10–62). For both indices, 0 represents the best possible situation. The CHAQ health status strongly correlated with the EQ-VAS (r=-0.65) (Table 2; Fig. 1). The CHAQ had a missing rate of 21%. Characteristics of missing data were examined using the EQ-5D-Y-5L level summary score and EQ-VAS. Patients with missing values in CHAQ tended to have EQ-5D-Y-5L lower level summary scores and higher EQ-VAS scores, indicating, better HRQL status (data not shown).

Convergent and divergent validity

EQ-5D-Y-5L dimensions that theoretically measure the similar or same constructs as the CHAQ dimensions, demonstrated strong associations (r = 0.59–0.74). The exception was the EQ-5D-Y-5L “doing usual activities” versus the CHAQ “grip” dimension which was moderately correlated (r = 0.37, 95% confidence interval: 0.28–0.46). This suggests good convergent validity for the EQ-5D-Y-5L parent-proxy version. For EQ-5D-Y-5L dimensions that theoretically might be correlated with the CHAQ dimensions, we also observed moderate to strong associations (r = 0.41–0.65), except for EQ-5D-Y-5L “having pain or discomfort” versus the CHAQ “grip” dimension (r = 0.33, 95% confidence interval: 0.24–0.42) (Table 3).

Table 3 Convergent-divergent validity results: Correlation between EQ-5D-Y-5L and CHAQ dimensions, presented by “theoretically strongly correlated”, “theoretically moderately correlated”, and “theoretically weakly correlated or no correlation” categories (Base-case: 370 patient visits representing 329 patients)

For the EQ-5D-Y-5L dimensions that theoretically do not measure the similar constructs with the CHAQ dimensions, there was no association or weak to moderate associations (r = 0.10–0.48). No EQ-5D-Y-5L dimensions directly measure a similar construct as the CHAQ “eating” dimension, and only the “looking after myself” dimension might theoretically correlate with the “eating” dimension. The Spearman’s correlation between the “eating” dimension and each of the other four not-related EQ-5D-Y-5L dimensions reflected no association or weak association (r = 0.10–0.23). This indicated a good divergent validity (Table 3, Appendix 3).

Known-group validity

Table 4 displays mean differences of EQ-5D-Y-5L level sum scores across Known Groups, their effect sizes, and p-values. Importantly, the distribution of the level sum scores was not normal. However, we decided to report means, SDs, and parametric metrics for three reasons. First, our reasonably large sample size makes the violation of the assumption of normality less problematic [55]. Second, the parametric metrics reported are common in the current literature and readers are more likely to be familiar with it, easing interpretation. Finally, we reached the same conclusions when comparing metrics from Table 3 with its non-parametric version that reports medians, IQR, median difference, Cliff’s delta, and non-parametric p-values (Appendix 4).

Table 4 Known group analysis in terms of EQ-5D-Y-5L level summary scores

In the known-group analyses, patients with non-oligoarticular JIA, having shorter disease duration, experiencing more disease activity, suffering more pain, having more functional disabilities and worse well-being showed significantly higher EQ-5D-Y-5L level summary scores, indicating worse HRQL, compared to their counterparts. 15 of the 17 (88%) statistical tests were in accordance with the hypotheses at the p < 0.05 level, with 14 being significant at the p < 0.001 level, and one at p = 0.005 level (“1–5 active joints” vs. “6–10 active joints”). Comparisons between “minimal disease activity” and “no active disease” (p = 0.325) and between “having joint pain” and “no joint pain” (p = 0.16) were insignificant at the p < 0.05 level. The effect size of these mean differences was in moderate to large magnitude in most of the known groups, especially in the known groups related to disease activity, function disability, and wellbeing. This suggests that the EQ-5D-Y-5L parent-proxy version can discriminate pre-specified known groups (Table 4).

Informativity

The Shannon’s evenness index for mobility, looking after myself, doing usual activities, having pain or discomfort, and feeling worried, sad, or unhappy dimensions were 0.80, 0.52, 0.85, 0.88, and 0.74, respectively. The informativity of looking after myself dimension was lower than the other dimensions (Table 5).

Table 5 Shannon’s absolute index (H’) and Shannon’s evenness index (J’)

Sensitivity analysis

In the sensitivity analyses, the sample had 472 patients representing 540 patient visits, with patients under 4 years old contributing 14% of patient visits. The patient characteristics of this sample were similar to the base case (Table 1). Slightly more problems were reported in the “looking after myself” dimension (33%) compared to the base case (28%), while the response pattern of the other dimensions and the summary score of the EQ-5D-Y-5L and EQ-VAS remained consistent (Table 2). The convergent, divergent, and known-group validity findings were also demonstrated in this sample (Appendix 3, Table 4). Compared with the base case, the Shannon’s evenness index showed similar values in all dimensions except the “looking after myself” dimension (0.62, versus base case 0.52) (Table 5).

Discussion

This is the first study exploring the psychometric properties of the EQ-5D-Y-5L parent-proxy version among JIA patients. We used a heterogenous sample and found that the EQ-5D-Y-5L demonstrated a good convergent and divergent validity versus the CHAQ parent-proxy version and the ability to discriminate known groups defined by disease characteristics, functional ability, pain, and wellbeing, suggesting the future use of EQ-5D-Y-5L to assess HRQL for JIA patients.

We had a robust sample size representing a diverse range of patients. The base-case analytical sample included data from 467 visits representing 407 patients, which is more than previous EQ-5D-Y-5L validation studies [20,21,22,23,24,25,26]. The sample size for the correlation analysis was 370 visits, and ranged from360-467 for the known-group analyses. There was mild to moderate disease burden among the study population, based on various disease activity measures, and the functional ability, pain, and well-being assessed by the CHAQ. This is consistent with findings from other studies: the advancement of JIA management in recent years has led to a decrease in symptoms and disease activity and better overall assessment [16]. The HRQL measured by the EQ-5D-Y-5L reflects a similar pattern. The estimated mean and median of EQ-5D-Y-5L level summary score were around 9, which represents 4 dimensions with mild problems, 2 dimensions with moderate problems, or 1 dimension with severe problems (with the rest of the dimensions having no problems).

We examined the convergent and divergent validity of the EQ-5D-Y-5L by comparing its dimensions with the CHAQ dimensions. Each CHAQ dimension assesses the capability of a patient to perform tasks in that functional area. Only the “eating” dimension did not have a directly corresponding dimension in the EQ-5D-Y-5L that measured similar or the same constructs. For other functional areas, although some seem not semantically linked to any EQ-5D-Y-5L dimensions, e.g., “reach”, these dimensions had a wide range of tasks defining the functional ability, and it turns out that each dimension has one or more tasks reflecting similar constructs as the EQ-5D-Y-5L dimensions. In the “reach” dimension of the CHAQ, four tasks were included: (1) “Reach and get down a heavy object…”; (2) “Bend down to pick up…”; (3) Pull on a sweater over his/her head; (4) Turn neck to look back over shoulder. Based on the wording of these tasks, the “reach” dimension theoretically relates to the EQ-5D-Y-5L “looking after myself” and “doing usual activities” dimensions. Therefore, we were able to thoroughly analyze the convergent and divergent validity of the EQ-5D-Y-5L among JIA patients. The EQ-5D-Y-5L demonstrated good convergent and divergent validity. This reflected that though the EQ-5D-Y-5L, a generic measure, could not directly assess the JIA-specific functional ability, worse (or better) EQ-5D-Y-5L dimensional responses likely indicate that patients experience more (or less) difficulty in the corresponding functional areas.

We performed extensive known-group analyses to assess whether the EQ-5D-Y-5L could differentiate patients with known differences in HRQL, and the EQ-5D-Y-5L performed well in the specified known groups. 88% of the statistical tests were in accordance with the pre-specified hypotheses, surpassing the 75% threshold thus supporting that known-group validity is established [38]. In post-hoc pairwise comparisons where the known group contains three or more categories, Bonferroni correction was used, which reduced the chance of Type I error. Furthermore, the differences in HRQL across categories were significant at p < 0.001 level in most of the known groups, together with a moderate to large effect size.

Validation studies on the EQ-5D-Y-5L (not in the JIA population) made head-to-head comparison with the EQ-5D-Y-3L and show that the EQ-5D-Y-5L can decrease the ceiling effect and increase the informativity, compared with the EQ-5D-Y-3L [20,21,22,23,24, 26]. In our study, as the data collection is still ongoing and there is a limited sample with both instruments, we have not compared the 5L instrument with the 3L instrument, however, this can be explored in the next steps of this research. Using the 15% threshold [38], we found the EQ-5D-Y-5L has no ceiling/floor effect when measuring HRQL among patients with JIA. In terms of informativity, the reported Shannon’s evenness index in other EQ-5D-Y-5L validation studies ranged from 0.10 to 0.73 [20, 21, 24], and our study population reported 0.52–0.88, indicating more even distributions. Based on the ceiling/floor effect and the informativity statistics, the EQ-5D-Y-5L parent-proxy version has the informational richness to assess JIA patients.

JIA can be diagnosed at a very young age, and our study did recruit some patients younger than 4 years old. For these patients, EQ-5D-Y-5L is not recommended to use as some dimensions might not be appropriate for infants and toddlers. As such, we excluded children aged 1–3 years old in the base case and included them in the sensitivity analysis. We did not observe many differences in the psychometric properties between the two samples. Patients under 4 years old accounted for 14% of the sample, so this subpopulation possibly did not affect the overall results in a significant way. According to another study that solely analyzed the performance of the EQ-5D-Y-3L proxy version among young children (aged 3–5), HRQL expressed by the summary scores of the EQ-5D-Y-3L worked well and the known-group validity can be demonstrated in this young population. However, the “looking after myself” dimension is problematic for children aged 3 years old due to age-related difficulties [56]. This is consistent with our findings, as we also identified more problems reported in the “looking after myself” dimension in the sensitivity analysis. The EuroQol Research Foundation is developing EQ-TIPs [57], an instrument for infants aged 0–3 years old, to better measure HRQL among this population. The EQ-TIPS has an “eating” dimension, as eating is an important aspect of an infant and toddler’s life and a large focus of attention for parents [58]. Exploring the validity of EQ-TIPS on JIA patients is warranted considering the age characteristics of this population.

This study has strengths and limitations. The study sample was collected from national JIA cohorts in Canada and the Netherlands, which include many pediatric rheumatology clinics in these two countries. Recruited patients had various disease subtypes, were at different disease progression stages, and had diverse treatment experiences. There is also a great proportion of patients with relatively severe disease activities and disabilities. This suggests good generalizability of our study to JIA patients in western countries. However, we did not have enough data from patients aged 4–7 years old to analyze purely on this population. Patients aged 8 years and older dominated the study sample, and the psychometric performance of EQ-5D-Y-5L would be more determined by the older children. Therefore, the psychometric properties we observed might not provide the true picture among younger children. Also, the CHAQ does not have a dimension regarding mental health. We were unable to justify whether the “feeling worried, sad, or unhappy” dimension of the EQ-5D-Y-5L is valid to measure mental health conditions of the JIA population, although this dimension had a good divergent validity to all CHAQ dimensions.

Some clinical variables and the CHAQ used in the validation study had some missing values. The cJADAS10 disease activity score (23% missing) and the three CHAQ index (disability index (21%), health status (21%), and pain index (21%)) were the variables that had the most missing values. We used a complete dataset when we performed analyses with regard to those specific variable, and the sample size still ranged from 360 to 467.

The parent proxy is a recommended and common type of surrogate to rate the functional disability and disease activity among JIA population, especially for children younger than 8 years old [29]. When comparing with the CHAQ parent-proxy version, we found the EQ-5D-Y-5L parent-proxy version is a valid instrument to measure HRQL of the JIA population. However, it is unclear whether the EQ-5D-Y-5L parent-proxy version performs better or worse compared to the self-reported version. In other pediatric disease areas, findings were diverse in terms of the agreement between the parent-proxy version and self-reported version of a PROM [59, 60]. It is important to examine the parent-patient agreement of the EQ-5D-Y-5L among JIA population in future research.

Conclusion

This research indicates that the EQ-5D-Y-5L proxy version can be appropriately used among patients with JIA, as it demonstrated a low ceiling effect, good construct validity and informativity. Future research is recommended exploring responsiveness and comparing its psychometric performances with EQ-5D-Y-3L and EQ-TIPs, and other JIA-specific HRQL measures.