FormalPara Key Points for Decision Makers

The EQ-HWB-S is a new nine-item instrument designed for measuring health, social care and carer-related quality of life for use in economic evaluation.

We assess the validity of the EQ-HWB-S in a population of parents of children with and without health conditions, using pilot UK preference weights.

Parents of children with health conditions had poorer wellbeing scores on all items than the general population sample. On all tests, the EQ-HWB-S was shown to be a valid instrument in this population.

1 Introduction

Appropriate and relevant measurement of the health and wellbeing of patients is a priority in the evaluation of health interventions, and plays a key role in economic evaluation. A body of research is now building around the effects on health and wellbeing for informal caregivers and the importance of including these effects in decision-making for funding new health technologies [1]. Examples of the effects on health and wellbeing from caregiving may include declining health when a partner is unwell; caring for a person living with dementia, which can cause fatigue and anxiety; or caring for a child with a disability or health condition, affecting parents’ ability to work, sleep and care for themselves and their other children [2]. These effects are often described as ‘spillover effects’, and there is a strong case for accounting for this impact in economic evaluation where adopting a societal or ‘welfare’ perspective is now considered recommended practice [1].

Including carer direct health effects has been recommended by National Institute for Health and Care Excellence (NICE) since 2013 [3], and there is sufficient evidence that caregiver effects are measurable [4]. Despite this, a recent paper found that family/caregiver spillover effects were rarely included in technology appraisals and highly specialised technologies (HST) guidance [5]. It is increasingly recognised, as overlooking spillover effects leads to a risk of interventions being undervalued [6].

To include spillover effects for caregivers, appropriate instruments for measuring caregiver wellbeing are required [7]. To this end, a new instrument has been developed to measure health, social care and carer-related health and wellbeing, the EuroQol Health and Wellbeing (EQ-HWB) instrument [7, 8]. This instrument is preference weighted and thus suitable for economic analysis and has been developed to be used in caregiver (and other) populations. As such, the instrument may be a useful measure for quantifying spillover effects for economic evaluation. It is therefore essential that this instrument is evaluated in a range of caregiver populations. The domains and items included in the EQ-HWB instruments were generated predominantly through a large literature review of qualitative work that explored how patients, social care service users and carers report that their conditions, circumstances and related interventions impact their wellbeing (we are unclear whether caregivers of children with health conditions were included in development work) [9,10,11].

There are two EQ-HWB Instruments, a 25-item instrument (EQ-HWB) and a 9-item instrument [EQ-HWB Short Version (EQ-HWB-S)], which is intended for economic evaluation. The nine-item EQ-HWB-S was chosen for inclusion in the current study, as it was a shorter scale, and we expected that preference weights would soon be available [12]. Currently the EQ-HWB-S is only available for use in research, as it is still considered experimental [13]. The EQ-HWB-S could be considered a ‘cross-over’ instrument that can capture and measure elements of health and wellbeing that are relevant to users of both healthcare and/or social care [9]. Thus, the instrument goes beyond the more purely health aspects measured by other health-related quality of life instruments (such as the EQ-5D or the SF-6D [14]).

The generation of the items in the two EQ-HWB instruments has been documented [9, 11, 15], and face validity studies have been conducted across six countries: Argentina, Australia, China, Germany, the UK, and the USA [16]. Whilst it was challenging to ensure validity during development for every group, the conceptual framework [11] and face validity work [16] included carers. Thus, the need for evidence for validity across different groups of carers in different countries remains. Evidence is now beginning to emerge on the validity of the EQ-HWB-S for caregivers. In a study which investigated overlap and differences between the EQ-5D and the EQ-HWB-S [17], both instruments were able to distinguish between participants with and without physical and mental health illnesses, but only the EQ-HWB-S items could distinguish between caregivers and non-caregivers or between groups with high or low caregiver burden.

The evidence is still very limited on the use of the EQ-HWB-S in caregiver populations, and there is no current research on validating the EQ-HWB-S in caregivers of children with health conditions. Having a disabled child is associated with higher risk of parental mental health problems and poorer quality of life compared with parents of healthy children [18]. Caregiving for children living with autism spectrum disorder has also been found to have a negative impact on quality of life [19] as has cerebral palsy [20]. Some of the EQ-HWB-S items may be particularly pertinent to caregivers of children with health conditions, as these caregivers have been found to have higher levels of exhaustion [21, 22], loneliness [23], mental health problems (anxiety and depression), cognition issues and lower sense of control [24]. Research into how well the EQ-HWB-S measures health and wellbeing in parents of children with health conditions is therefore an important next step towards validating this instrument for caregivers/parents of children living with health conditions.

The aim of the current study is to investigate the validity of the EQ-HWB-S in parents of children with and without health conditions, and to assess whether the instrument may be suitable to measure the health and wellbeing of caregivers in this population. We specifically aimed to assess: response distributions by health condition samples compared with the general population, the ability of the instrument to distinguish between known groups for child health conditions (parents of children in the child health condition samples, with special healthcare needs, where parents reported that the child had a chronic health condition) and a comparison of EQ-HWB-S item scores across a range of child health conditions. We also included whether coronavirus disease 2019 (COVID-19) had negatively affected parents’ wellbeing in the known group analysis as a robustness check, to confirm whether the EQ-HWB-S could detect differences in parent’s own wellbeing in addition to the impact of caring. There were few parent-specific variables available in this large dataset which had focused on children’s health-related quality of life; however, this variable was included, as the data were collected during extended COVID-19 lockdowns in Melbourne.

2 Methods

2.1 Research Design

Data for the study were obtained from the Quality Of Life in Kids: Key Evidence to Strengthen Decisions in Australia (QUOKKA) Research Program’s Paediatric Multi-Instrument Comparison (P-MIC) study [25]. The P-MIC study was conducted in 2021–2023 to provide new evidence on the performance of paediatric generic and disease-specific health-related quality of life (HRQoL) measures by comparing their psychometric properties, with the view to informing their use in economic evaluation and in routine hospital care. During the development stage of the P-MIC study, the QUOKKA Consumer Advisory Group provided feedback on the importance of also collecting the quality of life measurement of caregivers. The EQ-HWB-S was subsequently included in the P-MIC survey to inform measurement of caregiver wellbeing and thus enhance the richness of the data. P-MIC data from Data Cut 2, dated 10 August 2022 were used. Data Cut 2 includes approximately 94% of the total planned P-MIC participants.

Data were collected through an online panel, PureProfile, and through a sample collected at a paediatric tertiary hospital in Melbourne, Australia (see Technical Methods Paper for further details on the study methodology [26]). The online panel contained eight targeted samples of children with specific health conditions. If none of the children of the participant had these conditions, they were assigned to the ‘general population’ group. The hospital sample contained healthcare conditions ranging from acute to chronic, and from a wide range of hospital departments, including intensive care, short stay and outpatient clinics. The study was approved by The Royal Children’s Hospital (RCH) Human Research Ethics Committee (HREC/71872/RCHM-2021) and was prospectively registered with the Australia New Zealand Clinical Trials Registry (ANZCTR—ACTRN12621000657820).

2.2 Participants

The full sample included 6787 caregivers (only one caregiver per child). We restricted the sample to parents to enable a focus on the psychometric performance of a relatively homogenous group of caregivers, thus excluding survey respondents who were grandparents (n = 104), unrelated carers (n = 80), siblings (n = 43) or other relatives (n = 43). We also excluded cases where children were recruited through the hospital but may not have had health conditions (n = 238). In the hospital sample, if the child presenting to hospital was out of the age range for the condition, parents were asked to complete the questionnaire for another child within the age range, with an emphasis on rarer and more severe health conditions if there was more than one child to choose from; however, we removed these cases, as we were unclear whether they should be in a health condition or general population sample. In total, we excluded 530 survey responses (some participants fell into more than one category), giving a final sample of 6257 survey responses from parents of children between the ages of 2 and 18 years. Of these, 4432 (70.83%) were parents of children with a health condition and 1825 (29.17%) were from the general population. The conditions included in the online panel were recurrent abdominal pain (377, 6.0%), attention deficit/hyperactivity disorder (ADHD; 505, 8.1%), anxiety or depression (452, 7.2%), autism spectrum disorder (ASD; 476, 7.6%), asthma (465, 7.4%), dental problems (472, 7.5%), eating disorders (168, 2.7%), epilepsy (272, 4.4%) and sleep issues (442, 7.1%). The hospital sample contained 807 participants, where the study child may have had a range of medical conditions (condition types are presented in Table S1). We ran analyses excluding the hospital sample and results were found to be robust; therefore, we retained this sample in the analysis. Further information on study recruitment can be found in the published protocol and technical methods paper [25, 26].

2.3 Study Variables

Demographic characteristics for the study child included whether a language other than English was spoken at home, child gender (female, male, other), whether the parent identified that the child was of Aboriginal or Torres Strait Islander descent, whether the study child had a disability, whether the child had special healthcare needs and whether the child had a chronic health condition. Demographic characteristics for parents included gender (female, male, other), household weekly income by four income brackets, education level in four groups, government healthcare card (government healthcare cards are means tested and issued to those meeting relevant criteria such as receiving payments or supplements, earning low income or meeting age requirements), number of adults in the home and whether the parent was negatively impacted by COVID-19.

The nine EQ-HWB-S dimensions are reported against a five-item response scale [12] (scored 1 to 5). Participants are asked to consider their responses over the last 7 days. The nine items include (1) getting around inside and outside (mobility), (2) ability to do day-to-day activities (activities), (3) feeling exhausted (exhaustion), (4) feeling lonely (loneliness), (5) having trouble concentrating or thinking clearly (cognition), (6) feeling anxious (anxiety), (7) feeling sad or depressed (sad/depressed), (8) feeling as though one has no control over day-to-day life (control) and (9) how much pain they had experienced over the last 7 days (pain). A pilot value set has been developed to score the instrument on the basis of the preferences of the UK public [12]. As an Australian value set is not currently available, this study uses two methods to score the EQ-HWB-S: (i) the UK preference weights with possible values ranging from − 0.384 for the worse state described by the instrument to 1 for the best state and (ii) a simple level sum-score which aggregates the score for each of the items on the basis of the reported response level (1 for the best response to 5 for the worst response) giving scores between 9 and 45 (level sum-scores from here on are described as ‘sum-scores’). Item questions are presented in Table S2.

2.3.1 Study Variables for Known Group Validity Tests

Differences between known groups were hypothesised a priori by the study team. For the child health conditions, we expected lower EQ-HWB-S preference-weighted scores/higher sum-scores (lower wellbeing) for parents of children in the child health condition samples, with special healthcare needs and where parents reported that the child had a chronic health condition compared with their counterparts. These hypotheses are in line with the QUOKKA statistical analysis plan [26]. For the parent-specific condition, we expected that, where parents reported that COVID-19 had negatively affected them, their wellbeing would be lower compared with their counterparts not reporting COVID-19 impacts.

2.3.1.1 Child Health Condition Sample

The QUOKKA technical methods paper [26] defines the following samples: (1) recruited via hospital, (2) general population recruited via online panel and (3) health-condition-specific groups recruited via online panels. Parents of children from the health condition samples (combined Samples 1 and 3) were compared to parents of children from the general population Sample 2.

2.3.1.2 Child Special Healthcare Needs

Parents of children with a special healthcare need as per the previously validated Child Special Healthcare Needs Screener (CSHNS) [27] were compared with parents of children without a special healthcare need. Questions included in survey for the SCHNS were: (1) “Does the Study Child currently need or use medicine prescribed by a doctor (other than vitamins)?” and (2) “Does the Study Child need or use more medical care, mental health or educational services than is usual for most children of the same age?” Each of these two questions was followed by: “Is this because of ANY medical, behavioural, or other health conditions? Is this a condition that has lasted or is expected to last for at least 12 months?” [27]. Children who score “yes” to one or more of the first two questions and both follow-up questions are classified as having a special healthcare need.

2.3.1.3 Parent-Reported Child Chronic Health Condition

Parents of children with a parent-reported chronic health condition or disability, defined as a health condition expected to last or has lasted at least 6 months (“Does the Study Child have any medical conditions or disabilities that have lasted or are likely to last for six months or more?”) were compared with parents of children without a chronic health condition.

2.3.1.4 Impact of COVID-19 on Parent

Parents were asked the following question in the survey: “Is your (the parent’s) quality-of-life currently affected by the COVID-19 pandemic?”. Parents who responded “Yes, in a bad way” were compared with parents who responded “Yes, in a good way”, “No, not at all”, or “I’m not sure”. This was assessed as a binary variable.

2.4 Statistical Analysis

All analyses were performed in Stata version 15 (Statacorp, Texas, US). Participants were required to complete each question in the online survey before they could proceed, thus excluding the possibility of missing data.

2.4.1 Baseline Characteristics and Response Distribution

Demographic variables were compared between child health condition samples and the general population sample using chi-square tests for categorical data and t-tests for continuous, normally distributed data. Response distributions were conducted for the full sample, the health conditions sample and the general population sample, by calculating the number and percentage of cases of each response by item. We expected that items 1 (mobility), 2 (activities) and 9 (pain) might have a more skewed distribution than items 3–8 (exhaustion, loneliness, cognition, anxiety, sad/depressed and control), as we would expect parents of children with health conditions to be less likely to experience issues with mobility, activities and pain than with exhaustion, loneliness, cognition, anxiety, sadness/depression and control. We expected that the health conditions sample would have lower preference-weighted scores and higher sum-scores (worse wellbeing) than the general population sample.

2.4.2 Known Group Validity by Items

We conducted non-parametric (Mann–Whitney) and parametric (t-test) tests for known group validity for the health conditions versus general population samples for each EQ-HWB-S item. Non-parametric tests were conducted due to the non-normal distribution of the responses, and t-tests were included for interpretability. We expected that the health population samples would have higher mean response scores than the general population sample, and that these differences would be higher for items such as exhaustion [21, 22], loneliness [23], mental health problems (anxiety and depression), cognition issues and lower sense of control [24].

To investigate the difference in the probability of selecting each response option between the health condition and general population samples, we conducted nine ordinal multivariate Probit regression analyses to estimate average partial effects whilst controlling for caregiver age, gender (two groups), number of children in the house (1, 2, 3 or more), single parent in household (yes/no), weekly income (four groups), caregiver education level (four groups) and receiving a government healthcare card (yes/no) [28]. For this analysis, we aggregated the moderate, severe and extreme/unable categories, due to the low observations across the more severe range of responses (hence, we combined responses 3 to 5 to form the third group), as per Henry and Cullinan, 2021 [28].

2.4.3 Known Group Validity for Preference-Weighted Scores and Sum-Scores by a Range of Variables

Known groups for EQ-HWB-S sum-scores and preference-weighted scores were compared using independent t-tests, and Cohen’s d scores to measure effect size [26] across four variables (as outlined above). As per Cohen (1992) [29], effect sizes of 0.2–0.49 were considered small, 0.5–0.79 moderate and ≥ 0.8 large.

2.4.4 Known Group Validity—Subgroup Analyses by Parent Gender

Because the sample was primarily female, we investigated whether the instrument was also valid for male parents [we removed the removed the 23 participants (0.34%) who stated a gender other than female or male, as it was too small to analyse]. We calculated Cohen’s d scores to investigate the effect size differences between female and male parents. We used gender rather than sex, as this is how the question was worded in the survey.

2.4.5 Known Group Validity—Controlling for Demographic Covariates

We used linear regression to investigate whether the differences observed between these groups held when we controlled for the parent age and gender, number of children in the household, sole parent status, weekly income, parent education level and receipt of a government healthcare card (as a marker of adversity), as presented in the baseline characteristics table.

2.4.6 Comparison Across Health Conditions

We compared EQ-HWB-S item scores across the different child health condition samples using means and standard deviations across each health condition and ordered them in a table descending by EQ-HWB-S sum-score. Separate regression analyses were conducted for preference-weighted scores and sum-scores with each health condition dummy-coded to investigate the influence of the health condition sample in the model compared with the general population. Beta values were ordered by strength, and the order was compared between preference-weighted scores and sum-scores.

3 Results

3.1 Baseline Characteristics

Baseline characteristics for the general population sample and for the child health condition groups are presented in Table 1. Parents in the general population sample were statistically significantly more likely to have a child speaking a language other than English at home, less likely to have a disability and slightly more likely to be female than the child health condition group. Parents in the general population group were more likely to be female, in a higher income bracket and more highly educated, and less likely to hold a government healthcare card, and less likely to be a single parent than parents in the child health condition group. Children were slightly older in the child health conditions sample, at 10.15 years [standard deviation (SD) = 4.27] compared with the general population sample at 9.69 years (SD = 4.88; t (6255) = 3.69, p < 0.001) because the age of eligibility was often older for the child health condition groups (e.g. eating disorders). Child age ranged from 2 to 18 years. Parents were slightly younger in the child health conditions samples at 39.60 years (SD = 7.74) compared with the general population sample at 40.45 years (SD = 8.75; t (6255) = 3.79, p < 0.001). Parent ages ranged from 18 to 74 years.

Table 1 Baseline characteristics by health condition and general population samples

3.2 Response Distribution

Response distribution for the nine EQ-HWB-S items is shown in Table 2 for the full, general population and health condition samples. As expected, there was a greater distribution of responses for items 3–8 (exhaustion, loneliness, cognition, anxiety, sad/depressed and control) than items 1 (mobility), 2 (activities) and 9 (pain) in all samples presented. In the full sample, the mobility item was highly skewed, with 85% of participants having no difficulty with mobility. The activities item (item 2) was also skewed, with 64% of participants having no difficulty with their day-to-day activities. For item 9 (pain), only 35.8% of participants had no physical pain. In items 3–8, many participants selected one of the highest two responses for these items: exhaustion (37.9%), loneliness (17.5%), cognition (22.4%), anxiety (26.1%), sad/depression (17.3%) and control (16.1%). Only 8.8% of participants selected that they were exhausted none of the time. The overall trend between the general population and health condition samples was towards wider response spread in the health conditions samples, as expected. The general population and health condition samples are presented graphically in Fig. 1.

Table 2 Frequency of observations by EQ-HWB-S item for full, general population and child health condition samples (N = 6257)
Fig. 1
figure 1

a Distribution of EQ-HWB-S item responses for the general population sample (legend numbers match to the responses displayed in Table 2). b Distribution of EQ-HWB-S item responses for the health condition samples (legend numbers match to the responses displayed in Table 2).

3.3 Known Group Validity—Items

We conducted non-parametric (Mann–Whitney) tests and parametric tests (t-tests) to compare item scores between the health condition samples and the general population sample. Non-parametric and parametric tests were comparable; thus, we present the t-tests in Table 3, as they are more easily interpretable (non-parametric tests are included in Table S3). There were significant differences between health condition samples and the general population samples on all items, as expected. We had hypothesised that mean differences would be greater for the exhaustion, loneliness, cognition, mental health (anxiety and sad/depression) and control than for mobility, activities and pain. Mean differences for exhaustion, loneliness, cognition, mental health (anxiety and sad/depression) and control were between 0.45 and 0.62, whilst mean differences for mobility, activities and pain were between 0.12 and 0.36, confirming our hypothesis.

Table 3 Known group comparison analysis for EQ-HWB-S items by health condition samples (n = 4432) and the general population sample (n = 1825) and estimated partial effects across EQ-HWB-S items

The estimated average partial effects for the health condition versus general population samples are presented in the final columns of Table 3. Our key findings from this analysis were that there was a consistent trend across all dimensions for the health conditions sample to have worse item scores than the general population sample when controlling for key demographics of the caregiver. For instance, the results suggest that respondents in the health conditions group were 7.8 percentage points (ppts) less likely to report no difficulty with mobility problems, 3.0 ppts more likely to have slight difficulty with mobility and 4.8 ppts more likely to have some difficulty, have a lot of difficulty or indicate unable. Similar independent associations were found across all domains. The partial effects from the ordered probit models suggest that items with the greatest probability of being in the third category are cognition, exhaustion and anxiety.

3.4 Known Group Validity—Preference-Weighted Scores and Sum-Scores

In all t-tests where the outcome variable was the EQ-HWB-S preference-weighted score or sum-score, there were significant differences between known groups in the hypothesised directions with p-values of less than 0.001, as shown in Table 4 for the whole sample, and subgroups for women and men. Effect sizes (Cohen’s d) were moderate for the three child health variables [29] and small for parent quality of life (QoL) impacted by COVID-19. Women had slightly higher effect sizes (0.45–0.50) than men (0.40–0.50) for the child health conditions. There were stronger differences in effect sizes from COVID-19 on women (0.40) compared with men (0.32). A histogram of the distribution of the EQ-HWB preference-weighted score is shown in Fig. S1, and for the sum-score in Fig. S2.

Table 4 Known group comparison analysis for total sample, women and men caregivers for EQ-HWB-S preference-weighted scores and sum-scores, over four known groups

3.5 Known Group Validity—Subgroup Analyses by Parent Gender

Effect sizes were similar between female and male parents, with slightly lower Cohen’s d scores for male parents on all variables (Table 4).

3.6 Known Group Validity—Controlling for Demographic Covariates

Multiple linear regression was used to test if a range of demographic variables, found to be significantly different between groups in Table 1, were associated with EQ-HWB-S preference-weighted scores and sum-scores (the dependent variables). All independent variables were retained in the models. We used r2 to test model fit. We tested for multicollinearity by conducting correlation analysis for all included independent variables, and none were higher than 0.80 (income and healthcare card were correlated at 0.55). The results for the EQ-HWB-S preference-weighted score and sum-score regression models are presented in Table 5. The variables of the child health condition sample (child health condition samples versus general population), parent age, parent gender (male, female), single parent household, weekly income (four groups) and government healthcare card (yes/no) were significantly associated with a better outcome for the EQ-HWB-S preference-weighted scores (F (10, 6082) = 96.53, p < 0.001, r2 = 0.14). Holding a government healthcare card and having a child in the health condition sample (compared with general population) had the largest associations with the EQ-HBW preference-weighted score, thus confirming our hypotheses. Further, higher income was associated with higher parent EQ-HWB-S preference-weighted scores (higher wellbeing). The analysis using the sum-scores, as shown in the second section of the table, showed similar results: (F (10, 6082) = 111.13, p < 0.001, r2 = 0.15).

Table 5 Regression analysis summary for parents’ EQ-HWB-S sum-scores and EQ-HWB-S preference-weighted scores

3.7 Comparison Across Health Conditions

EQ-HWB-S items, preference-weighted scores and sum-scores were compared across all eight child health condition samples, the hospital sample and the general population sample. Table 6 shows means and standard deviations across EQ-HWB-S items; the table is ordered by descending sum-score values. Overall, parents of children with ASD, eating disorders and sleep problems had higher EQ-HWB-S item and sum-scores means (lower wellbeing), whilst parents of children in the general population, dental problem, asthma and the hospital sample had lower EQ-HWB-S item scores and sum-scores (higher wellbeing). The items with the largest wellbeing impacts were exhaustion, cognition and anxiety. In Fig. 2, the pattern is similar between the health condition samples and the general population; however, the general population sample have overall lower EQ-HWB-S item scores (higher wellbeing).

Table 6 Means of parent EQ-HWB-S item scores, sum-scores and preference-weighted scores for child health condition samples, ranked by EQ-HWB-S sum-score
Fig. 2
figure 2

Mean parent EQ-HWB-S scores by item for child health condition and general population samples

The regression analyses with dummy coding for health conditions to investigate the influence of each health condition sample in the model compared with the general population are shown in Table S5. When ordered by the absolute beta values to compare preference-weighted scores with sum-scores, the conditions for preference-weighted scores were ordered: autism spectrum disorder (beta score = − 0.177), eating disorder (− 0.174), recurrent abdominal pain (− 0.125), sleep disorder (− 0.121), anxiety/depression (− 0.109), ADHD (− 0.101), epilepsy (− 0.093), hospital sample (multiple conditions; − 0.058), asthma (− 0.033) and dental (− 0.027), as shown in Table S6. There were only slight differences in the relative weights of the conditions when using preference-weighted scores and sum-scores: autism spectrum disorder, eating disorders, recurrent abdominal pain and sleep disorder had the highest beta values indicating higher impact on caregiver wellbeing when using both preference-weighted scores and sum-scores and the lowest in both were dental and asthma. In general, there was a high level of consistency in conditions showing the largest impact regardless of whether sum-scores or preference weighted scores were used. Overall, autism spectrum disorder, eating disorders, recurrent abdominal pain and sleep disorder had the highest beta values, indicating high caregiving impact on health and wellbeing.

4 Discussion

In this study, we investigated the psychometric performance of the EQ-HWB-S in parents of children with and without health conditions. Overall, the EQ-HWB-S performed well in this large sample of parents, suggesting that it is a valuable instrument for measuring spillover effects for economic evaluation. We found that there were significant differences between known groups measured using level sum-scores and preference-weighted scores, all in expected directions. Effect sizes were moderate overall, and similar between female and male parents. The significant differences in EQ-HWB-S scores between the health condition sample and general population sample were stable when controlling for demographic variables. Items that were hypothesised to have larger mean differences when comparing parents with and without children with health conditions conformed to expectations. These results concord with the limited amount of previous research in caregiver populations [17, 30].

In terms of distribution, the mobility item was highly skewed with few mobility issues in the sample; however, significant mobility issues were not expected in this parent population. EQ-HWB-S items 3–8 (exhaustion, loneliness, cognition, anxiety, sad/depressed and control) had wider decrements in wellbeing than items 1, 2 and 9 (mobility, activities and pain) as was expected. There were consistent differences in response distributions between the health condition and general population samples, which were corroborated through further analysis using partial effects.

In known group validity testing, EQ-HWB-S sum-scores were different between groups across all tests in hypothesised directions, suggesting that the EQ-HWB-S preference-weighted scores and sum-scores were sensitive to group differences. The EQ-HWB-S preference-weighted scores and sum-scores gave similar results in these tests; however, the effect sizes for the preference-weighted scores were slightly lower, probably due to the pilot weights being stronger for mobility and pain than items such as exhaustion, loneliness, cognition and control, which were higher in this population [12]. There were larger mean differences between groups for exhaustion, loneliness, cognition, mental health and control than for mobility, activities and pain, as expected in this population.

We found similar patterns between the child health condition sample and the general population sample, with exhaustion, cognition and anxiety scoring highest in both cohorts, but with parents of children with health conditions scoring lower on preference-weighted scores and higher on sum-scores (lower wellbeing) than the general population sample. The high levels of exhaustion found in this study are consistent with previous research on parents’ quality of life [31], and particularly for parents of children with health conditions [32, 33]. The hospital sample was closer to the general population than most of the child health condition samples, possibly because their conditions were more likely to be acute and resolved more quickly or were being actively managed in a way that impacts were minimised.

The baseline comparison between parents with and without a child in one of the health condition samples (child health condition samples versus general population) indicated that there were significant demographic differences between groups. We therefore conducted a regression analysis to determine whether these differences between the groups on EQ-HWB-S scores remained when we controlled for demographic variables. Differences between child health condition samples versus general population were maintained with this variable having the highest coefficient, suggesting that differences in EQ-HWB-S scores between parents with and without children with medical conditions remain when controlling for background demographic circumstance. The regression analysis confirmed our hypothesis that parents with a child with a health condition had lower EQ-HWB preference-weighted scores (lower wellbeing) than those in the general population sample, even when controlling for other factors such as holding a healthcare card (an indicator of adversity), which was also significant in the model.

Given recent evidence of the importance of separate analysis by sex in clinical research and with journals increasingly requiring analysis separately for female and male participants [34], it was important to determine how well the EQ-HWB-S instrument performed for male caregivers, who only made up 18% of the sample. We know that the burden of unpaid work is disproportionately carried by women, which exposes them to greater mental health risk [35]. Our sample accords with understood societal norms, where women (82% of this sample) may be expected to have a higher caregiving load than male caregivers. Through the analysis by gender, we found that the EQ-HWB-S appeared to be fit for purpose for both female and male parents. The effect size of the differences in EQ-HWB-S scores between caregivers of children with and without health conditions was higher for women than men in all three health condition known groups analyses, suggesting that, when caring for a child with a health condition, women may carry more of this load. The effect sizes for the COVID-19 variable were also larger for women, concordant with recent evidence [36].

4.1 Strengths and Limitations

The current study features a large dataset with responses from over 6000 children and families with data on specific child health conditions, as well as a general population sample, allowing detailed exploration of differences between groups, and supplying a valuable addition to the literature on the validity of the EQ-HWB-S in this population. A limitation of this study was that we did not have information on how much of the caregiving burden fell to each parent, or of other caring roles participants in this sample may undertake, such as caring for elderly parents. This is pertinent to the analysis by gender; it is possible that fathers do less overall caregiving than mothers, thus accounting for the slightly lower effect sizes. It would therefore be important for future studies to collect information on caregiving intensity and duration, to further examine differences between groups and explore caregiving impact. We note that we did not include the 0.34% of participants who did not specify either male or female as their gender in the analysis due to the very small size of this portion of the sample. We did not have a variable for parent health status, which would have been useful to prevent confounding in the regression analysis. We were unable to assess EQ-HWB-S responsiveness to change or test–retest reliability due to the data collection being limited to one time point. Further, a head-to-head study of the EQ-HWB-S with other caregiver measures would aid in determining how well this instrument compares with other instrument options for measuring caregiver health and wellbeing.

The level sum-score has known limitations, as the same sum-score can have quite different profiles; thus, giving equal weight to each dimension makes assumptions about their relative importance [37]. To our knowledge, this is the first study to apply the pilot UK preference weights to EQ-HWB-S data. Here, we have demonstrated the use of these weights and compared them with the sum-scores throughout the paper. A limitation to the use of these weights is that different preference weights, especially if they were designed for an Australian population, could potentially lead to somewhat different results. Here, though, we see that the sum-scores and preference-weighted data are largely complementary, and we do not see differences in test significance between the utility and level sum-scores. There are several other caregiver HRQoL tools such as the CarerQol, Caring Experience Scale and the ASCOT-Carer (adult social care), for which utility values are available. The key difference between these instruments and the EQ-HWB-S is that these measures focus only on carer quality of life, whereas the EQ-HWB-S is a generic instrument designed to capture carer, patient and social care quality of life, thereby enabling benefits to be measured in the units for both patients and their caregivers. In future studies we plan to compare the EQ-HWB-S with the CarerQol to further understand the strengths and weaknesses of the EQ-HWB-S.

4.2 Conclusions

This study found that the EQ-HWB-S performed well and had good validity in comparing known groups in a population of parents of children with and without health conditions. The measure also appeared to be valid for both female and male parents. Due to the large sample size and the availability of different variables to perform known group validity, this paper adds to the literature building towards a greater understanding of the use of the EQ-HWB-S generally, and specifically for this population. Future work should focus on a more in-depth investigation of the EQ-HWB-S in this population to determine responsiveness to change and test–retest reliability, and whether the instrument can detect caregiver intensity and duration in known groups validity testing.