Health-related Quality of Life using the EQ-5D-5L: normative utility scores in a Dutch female population

Purpose Normative utility scores represent the health related quality of life of the general population, are of utmost importance in cost-effectiveness studies and should reflect relevant sexes and age groups. The aim of this study was to estimate EQ-5D-5L normative utility scores in a population of Dutch females, stratified by age, and to compare these scores to those of female populations of three other countries. Methods Dutch women completed the EQ-5D-5L online between January and July 2020. Mean normative utilities were computed using the Dutch EQ-5D-5L value set, stratified by age, tested for differences using the Kruskall–Wallis test, and compared to normative utility scores of female populations elsewhere. Additionally, to support the use of the Dutch EQ-5D-5L data in other settings, normative utility scores were also calculated by applying the value sets of Germany, United Kingdom and USA. Results Data of 9037 women were analyzed and the weighted mean utility score was 0.911 (SD 0.155, 95% CI 0.908–0.914). The mean normative utility scores differed between age groups, showing lower scores in older females. Compared to other normative utility scores of female populations, Dutch mean utilities were consistently higher except for age groups 18–24 and 25–34. With the three country-specific value sets, new age-specific mean normative utility scores were provided. Conclusion This study provides mean normative utility scores of a large cohort of Dutch females per age group, which were found to be lower in older age groups. Utility scores calculated with three other value sets were made available. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-022-03271-3.


Plain English Summary
Health-related quality of life is a measure of the impact of disease and treatment on an individuals' disability and daily functioning. Health-related quality of life outcomes are gathered using questionnaires (e.g. EQ-5D-5L) and respondents' answers can be converted into a single utility score, that reflects an individual's health state at a particular point in time. These utility scores are used in cost-effectiveness studies. Utilities that are obtained in the general population, instead of patients with a specific disease, are called normative utilities. Differences in normative utility scores between countries, age groups and gender have been found and choosing the most accurate set of normative utility scores is important. However, Dutch age and genderspecific normative utility scores for females are currently not available. This study converted the EQ-5D-5L results of 9037 women into mean normative utility scores stratified by age. Relatively high mean normative utility scores for the EQ-5D-5L in Dutch females were found in all age groups compared to female populations of other countries, with the lowest scores in older women. The EQ-5D-5L normative utility scores calculated with Dutch data and the value sets of Germany, United Kingdom and USA in this study support the use of the Dutch data in international cost-effectiveness studies when age and country-specific normative utility scores for women are not available.

Introduction
The effectiveness of a health care intervention or strategy can be measured in a variety of ways. A commonly used method is measuring and comparing the Health-related Quality of Life (HrQoL) between groups. HrQoL is a measure of the impact of disease and treatment on an individuals' disability and daily functioning [1]. It includes factors that are part of an individual's health, without non-health aspects such as economic circumstances, and is often used in cost-effectiveness studies [2]. HrQoL outcomes are gathered using questionnaires and respondents' answers can be converted into a single utility score, usually between 0 and 1, that reflects the personal desirability of an individual's health state at a particular point in time [2]. The EQ-5D-5L is often recommended as the instrument to obtain utility scores [3]. To enable the conversion for EQ-5D-5L outcomes, pre-defined country-specific value sets have been developed to this aim [4].
In cost-effectiveness studies, utility scores are used to calculate quality adjusted life years (QALY's) for all relevant health states. If utility scores are not available for these health states, assumptions about such utilities have to be made. However, assumptions are sub-optimal compared to objectively measured utilities as this influences costeffectiveness ratios and ultimately decision-making [5,6]. Besides utilities for disease specific health states, also utilities for the general population are considered to be relevant. These so-called 'normative utility scores' can be used as a comparator for health profiles of patients based on subgroups with similar age and gender. Additionally, they can be used to compensate for a loss in HrQoL due to factors that are not caused by the disease or intervention of interest [7]. Currently, many cost-effectiveness studies made the assumption of a utility of 1 (reflecting perfect health) for the general population. However, Versteegh et al. obtained utilities in a general Dutch population and the results suggested that utilities of the general population tend to be below one [8]. This means that cost-effectiveness studies may overestimate the health of the general population, and thereby overestimate the loss in utility score caused by a disease or intervention. Therefore, up to date normative utility scores are needed to be used in cost-effectiveness studies.
Other countries have calculated normative utility scores using the EQ-5D and showed differences between genders [9][10][11]. In studies on women's health, using gender-specific normative EQ-5D utility scores of females only may be more accurate than population norms. Janssen et al. published EQ-5D index value population norms for 20 countries in Europe including the Netherlands [12,13]. Data of 2367 people, identified between 2001 and 2003, were used to calculate age stratified normative utility scores [14]. However, these results were based on the EQ-5D-3L, and the Dutch normative data for the EQ-5D-5L that was published thereafter, were not classified by gender [8,13]. This is a drawback for cost-effectiveness studies among only male or female populations.
Therefore, the aim of this study was to obtain EQ-5D-5L normative utility scores in a female Dutch cohort, stratified by age. In addition, these normative utility scores were compared to normative utility scores of female cohorts of other countries. Furthermore, three different country-specific value sets were applied to the answers of the EQ-5D-5L of the Dutch cohort. This analysis was conducted to illustrate the impact of using different value sets on age-specific mean normative utility scores, and to enable the use in cost-effectiveness studies in populations for which country-specific normative utility scores for women are not available.

Study participants
Data were collected in a study that initially obtained normative data for the Breast-Q (a breast cancer specific quality of life questionnaire) (Oemrawsingh et al. (2021), in press). Dutch women were invited to complete a web-based survey that was disseminated through social media platforms of the Erasmus Medical Center between January and July 2020. Because the researchers focused on breast cancer, normative data should be based on women unencumbered by the diagnosis of breast cancer. Therefore, women who were previously diagnosed with breast cancer were excluded from the survey.
Besides the Breast-Q, the survey also included the EQ-5D-5L. The current study made use of this EQ-5D-5L data.

Health related quality of life measured with the EQ-5D-5L
The Dutch version of the EQ-5D-5L was used to measure HrQoL [3]. The EQ-5D-5L is a non-disease-specific instrument, and consists of five dimensions (mobility, self-care, usual activities, pain/discomfort and anxiety/ depression), each with five levels of functioning, ranging from no problems to extreme problems. Eventually, 3125 different health states can be provided based on these five dimensions. A quality-adjustment weight or "utility" is a number anchored at 0 and 1, with "perfect health" carrying a weight of 1 and death carrying a weight of 0. A utility score below 0 is possible when a health state is valued worse than death. Utilities can be calculated after application of pre-defined values to a specific health state as indicated by a respondent. Utilities in this study were computed according to the Dutch tariffs for the EQ-5D-5L as established by Versteegh et al. [8].

Statistical analysis
Descriptive statistics, including standard deviations and confidence intervals, were calculated to present the mean normative EQ-5D-5L index scores per age group. Age was categorized into seven subgroups; 18-24, 25-34, 35-44, 45-54, 55-64, 65-74 and ≥ 75 years. A weighted mean normative utility score was calculated taking into account the population size per age group of the Dutch population in 2020 (See Online Appendix, Fig. 1) [15]. Because the data were not normally distributed, the Kruskal-Wallis test was used to compare mean utility scores between all age groups. The data analyses were performed using IBM SPSS Statistics (Version 25) and R (Version 1.2).

Comparisons with three other countries
The mean normative utility scores per age group were compared to normative utility scores for female populations in studies performed in Germany, South Australia, and the USA (US) [9][10][11]. Furthermore, the country-specific value sets used in these studies (i.e. the value sets of Germany, the United Kingdom (UK) and the US) were also applied to the EQ-5D-5L data to convert them into utility scores [16][17][18].

Results
The total sample included 9037 females with a median age of 46.0 years (range 18-90 years). According to the responses of the individual EQ-5D-5L dimensions, most health problems were identified in the pain/discomfort (41.2%) and anxiety/depression (29.5%) dimension ( Table 1). The anxiety/depression dimension showed relatively high percentages of any health problems (level 2-5) in the younger age groups, which decreased with increasing age. Health problems in the other dimensions increased when becoming older, which was most evident in the mobility dimension (Fig. 1). The mean utility score was 0.917 (SD 0.110, 95% CI 0.915-0.920) with a left-skewed distribution, as 44.7% had a utility score of 1 (n = 4037). The weighted mean utility score was 0.911 (SD 0.155, 95% CI 0.908-0.914).

Primary outcome
The mean normative utility score ranged from 0.929 (SD 0.102) (age group 25-34) to 0.881 (SD 0.081) (age group > 75). The highest mean normative utility scores were found in the three youngest age groups (between age 18 and 44 years) ( Table 2). After age 45, mean normative utilities decreased with increasing age with lowest mean utility scores in the oldest age group (> 75 years). The Kruskal-Wallis test revealed that there were statistically significant differences in mean normative utility scores between all age groups (p < 0.001). However, absolute differences were small.

Comparisons with three other countries
Compared to published normative utility scores for female populations in Germany, the US and South Australia, our mean normative utilities were consistently higher except for age groups 18-24 and 25-34 ( Table 3).
The mean utility scores were recalculated after applying the country-specific value sets of Germany, the UK, and the US to the EQ-5D-5L answers of our Dutch cohort. This resulted in slightly higher mean utility scores for all age groups with all three value sets ( Table 2). The mean utility scores were the highest when the German value set was applied.

Discussion
We obtained normative utility scores using the EQ-5D-5L in a sample of 9037 Dutch females and found relatively high utility values for Dutch females aged 18 to > 75 years old. In general, the mean normative utilities were lower in the older age groups although absolute differences were small. Applying the country-specific value sets of Germany, UK and US to the EQ-5D-5L answers of our Dutch sample resulted in consistently higher mean utility scores in all age groups as compared to the mean utility scores calculated with the Dutch value set. Our mean normative utility scores in the younger age groups were slightly lower than previously found in female populations of other countries [9][10][11]. This difference may be caused by the sampling method. Young people that are less healthy may spend more time on their computer, mobile phones or social media than healthy adolescents who are possibly able to do more activities. Therefore, they might have been more likely to encounter the study invitation and more inclined to complete a questionnaire on their health. The normative utility data of female populations of other countries was collected between 2013 and 2017 [9][10][11]. The lower Dutch utilities in the younger age groups compared to those of previous studies might be explained by an increase in mental health problems in adolescents over the last years as observed in the Netherlands [19]. The data of this study were collected during the start of the COVID-19 pandemic, which also led to more anxiety and mental health issues particularly in females and adolescents, and may have contributed to lower utility scores [20]. Besides, it appears as if the use of the Dutch value set is partially responsible for the differences in utility scores in younger age groups (up to 35 years), because the differences in utility becomes smaller when the German, UK, and US value sets were used. In contrast, our mean normative utility scores in the older age groups were higher than those in female populations of other countries. Particular in these age groups, the differences were enlarged by the use of the German, UK and US value sets. That is, these differences cannot be explained by the value sets themselves.
The oldest age group (> 75 years) showed a relatively high mean normative utility, as none of the participants scored level four and five across all dimensions. This might indicate that older Dutch women have a relatively good quality of life, and possibly better than older women elsewhere. In contrast to a recently published Russian article reporting normative utility scores, Dutch women did not show many problems in the self-care dimension for all age groups [21]. In the current study, the frequency of having any problems in the anxiety/depression dimension decreased with increasing age, but was consistent across all age groups in the Russian population. Although the pattern of having any problems in the mobility dimension was similar in both studies, the frequency in the older age group was considerably higher in the Russian population [21]. However, the high mean normative utilities may also be related to most participants being between 75 and 80 years of age, and no one being older than 90 years. Because more health issues appear with increasing age, this may explain the differences with other studies if they included older participants [21][22][23]. In addition, the sample of older participants (n = 34) was relatively small, which reduces the generalizability. Another explanation is the use of social media as a recruitment method, which may have caused some selection bias. Older females that are able and willing to complete a questionnaire through an online survey are potentially in better health [24]. On the other hand, internet is easily accessible in the Netherlands and internet use is higher than in most other western countries, also in older people [25]. Interestingly, Jiang et al. has shown differences in outcome between face-to-face and online sampling, with higher EQ-5D-5L index scores in the face-to-face population for most age groups [9]. However, the index scores of the older participants (i.e. above the age of 65) were slightly higher in the online population [9].
We found statistically significant differences in mean normative utility scores between the age groups. However, we expected larger age-specific absolute differences beforehand based on results of previous normative studies (both males and females) in the Netherlands [26]. Nevertheless, we recommend to use age and gender-specific reference values, as they are important for cost-effectiveness studies and can have a substantial effect on outcomes [5,6]. It would be interesting to investigate to what extent our age-specific values alter the outcomes of cost-effectiveness analyses. To note, our normative utility scores are mainly intended to answer women-specific research questions, and they might not be directly comparable to future normative utility scores of Dutch males as they are not generated from the same sample.
The key strengths of our study are the use of the EQ-5D-5L to obtain normative utility scores and the large sample size. The EQ-5D-5L is more sensitive than the EQ-5D-3L version which has several limitations (e.g. ceiling effects in patient populations, non-detection of small differences or changes in patients with mild conditions) [27][28][29]. Furthermore, the sample size of our cohort was substantially larger (at least three times) than the samples in previous studies, and in combination with the more sensitive 5-level version of the EQ-5D, our study may have resulted in more reliable outcomes [9][10][11][12]. Another strength is that we provide age-specific mean utility scores specifically for women. These could be used an up-todate reference point in research and Dutch health policy evaluations, such as breast and cervical cancer screening strategies, and health policies for pregnancy and childbirth. Importantly, our study did not gather demographic data which makes it difficult to state anything about the representativeness of the population. We used a web-based survey that was disseminated through the institutes' social media platforms, which are all accessible for the general population. To be able to complete the survey, access to internet was required. Especially in the Netherlands, internet use has increased over the last decade and is nowadays extremely high as 95% of total population has access to internet [30]. This makes the internet-user population very similar to the general population. Even back in 2013, internet was the main source to search for health information (83%) in the Netherlands, and social media is frequently used for this purpose [31]. The percentage of social media use is more than 90% for the age group of 18-54 years, and between 76 and 89% in the age group of 55-64 years of the Dutch population [32]. Although we cannot assume that all female internet-users have seen our survey, we believe that the survey reached a large and representative part of the Dutch female population. Despite our large sample size the group of elderly females was relatively small. In other countries where internet availability is less developed, using this sampling method might be more of an issue because certain populations are possibly left out.
To date, it is unclear if and to which extent utility measurements on a national level can be generalized to other countries. However, there are differences between the country-specific value sets even between countries that were expected to have quite similar populations, socioeconomic status, health systems, or attitudes to health [13]. Therefore, using a country-specific value set is encouraged [33,34]. In this study, a subset of value sets of three other countries was used to calculate utility scores based on the answers to the EQ-5D-5L of our Dutch female cohort. This was done to illustrate the impact of using different value sets on age-specific mean normative utility scores, and also to provide age-specific mean normative utility scores to be used in cost-effectiveness studies in countries of which country-specific normative utility scores for women are lacking. For example, if a breast cancer study would be conducted in the UK, researchers probably prefer to use the UK value set to determine the utilities in patients. In order to allow for proper comparisons with the general population, they can also best use normative utilities calculated with the UK value set. If age-specific mean normative utility scores for women in the UK are not available, the normative utility scores calculated with the UK value set in this study may be a good alternative. Reporting the normative utility scores for different value sets enlarges the applicability in multiple international studies.

Conclusions
In this study, we presented age-specific normative utility scores for the EQ-5D-5L in Dutch females using different value sets. We found lower mean normative utilities in older age groups. Relatively high normative utility scores were found in all age groups, compared to those in other female populations. Furthermore, utility scores were calculated with value sets of three other countries which can be used as normative comparisons in international patient populations.

Acknowledgements
The authors would like to thank all women who participated in this study.
Author contributions All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by LMK and MEC. The first draft of the manuscript was written by LMK and MEC and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Funding
The authors have no relevant financial or non-financial interests to disclose.

Data availability
The data are not publicly available but are available for researchers that wish to apply their own country-specific EQ-5D-5L value set on the current dataset. Data shall only be shared with researchers upon reasonably request, at the discretion of the principal investigator.
Code availability Not applicable.

Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article.
Ethical approval Formal approval from the local Medical Ethics Review Committee was waived as the Dutch Medical Research (Human Subjects) Act did not apply to this study.

Consent to participate
Informed consent was obtained from all individual participants included in the study.

Consent for publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.