FormalPara Key Points for Decision Makers

This study developed population norms for EQ-5D-5L, the PROMIS-preference scoring system (PROPr) and SF-6D in Hungary. Given the differences in item content, type of response scale and recall period among these measures, understanding the extent of their variations in describing the population’s health status is crucial.

Generally, the most problems were reported on the SF-6D, followed by PROPr and the EQ-5D-5L. Problems related to physical functioning, self-care, usual activities/role limitations and pain increased with age, while mental health problems decreased in all three measures. Age, gender, education, employment, income, physical activity, medication use, body mass index, and having chronic conditions were found to be associated with utilities, depending on the instrument.

This study is the first to present PROPr population norms in any country, while simultaneously offering population norms for EQ-5D-5L, PROPr, and SF-6D.

1 Introduction

Generic preference-accompanied measures (PAMs) are commonly used in health technology assessments of new therapies and interventions. These instruments cover general aspects of health, such as physical functioning, mental functioning and pain, making them applicable across a broad range of health conditions [1]. PAMs consist of a descriptive system and preference weights (i.e. a value set), typically derived from societal preferences, that enable the assignment of health utilities to all possible health states described by the instrument [2]. These utilities allow for the estimation of quality-adjusted life-years (QALYs), a metric frequently used in cost–utility analysis [3]. Examples of such measures include the EQ-5D, Assessment of Quality of Life (AQoL), Short-Form 6-Dimension (SF-6D), Health Utilities Index (HUI) and 15D [4].

The most frequently used PAM at the international level is the EQ-5D [5, 6]. The EQ-5D originally had three response levels per domain (EQ-5D-3L) [7], which was later expanded to five response levels (EQ-5D-5L) [8], substantially improving its measurement properties [9]. The EQ-5D-5L is endorsed by pharmacoeconomic guidelines in many countries [5], including Hungary [10], and is the preferred PAM in 15 guidelines [5]. Over the past decades, more than 30 countries developed their own country-specific EQ-5D-5L value set [11]. The validity, reliability and responsiveness of the EQ-5D-5L have been demonstrated in numerous acute and chronic health conditions and multiple populations [12].

Another widely adopted PAM is SF-6D, derived from the 36-item Short-Form (SF-36) or the 12-item Short-Form (SF-12), designed to estimate utilities by capturing six domains of health [13, 14]. Several countries list the SF-6D as an applicable measure in their health technology assessment guidelines, alongside other options [5]. So far, 12 countries have established SF-6D value sets [15]. Similar to the EQ-5D-5L, it has demonstrated strong psychometric performance across multiple health conditions [16,17,18].

Recently, the Patient-Reported Outcomes Measurement Information System (PROMIS) adult generic profile measures have been receiving increasing attention [19]. Developed using advanced psychometric methods (item response theory) in the USA [20], PROMIS is based on item banks covering more than 100 different health areas [21]. Among the three PROMIS profiles for adults (PROMIS-57, -43 and -29), PROMIS-29 is the most widely used [19]. When complemented with two additional items relating to cognitive function (PROMIS-29+2), it is suitable for measuring health utilities using its value set, the PROMIS-preference scoring system (PROPr) [22]. Currently, only one country-specific value set is available for the PROPr [22]. The measurement properties of PROMIS-29, PROMIS-29+2 and PROPr have been tested in various settings [23,24,25,26,27,28,29].

The interpretation of generic PAM results may involve comparisons with reference values from the general population, known as population norms. These norms play an important role in the measurement of disease burden by providing age- and gender-specific reference values of the general population to which patients’ health status can be compared. Additionally, they can be used to assess the population’s unmet needs and to identify changes in general health status over time and across countries. Currently, Hungarian population norms exist for utility values of the EQ-5D-3L [30] and 15D generic PAMs [31], as well as for summary scores or T-scores of the SF-36 [32] and two PROMIS generic health status measures (PROMIS-29+2 and PROMIS Global Health) [27, 33]. The EQ-5D-5L has population reference values established in more than 30 countries [34]. SF-6D also has several population norms, however, considerably fewer than those of EQ-5D-5L. To the best of our knowledge, no studies have established PROPr population norms thus far.

The EQ-5D-5L, PROPr and SF-6D share multiple domains covering similar constructs of health (e.g. physical function, pain/discomfort). However, each PAM was developed using different approaches and varies across several characteristics, such as item content, wording, number of items per domain, response levels per item, type of response scale, and recall period. Consequently, it is important to understand the extent to which these measures differ in describing the population’s health status. Additionally, presenting population norms for multiple questionnaires from a common sample is rare [35,36,37], making it a unique opportunity for a comprehensive comparison. Although the Hungarian versions of EQ-5D-5L, PROPr and SF-6D showed good validity [26, 27, 38,39,40,41,42,43,44,45,46,47], their population norms have not yet been developed. This study therefore primarily aims to develop general population reference values for the EQ-5D-5L, PROPr and SF-6D based on a large sample of the adult general population in Hungary. We also compare the populations’ health status on the three instruments and explore their associations with sociodemographic and health-related variables.

2 Methods

2.1 Study Design

A cross-sectional online survey was administered involving the Hungarian adult general population aiming for a sample size of 1700 [26, 27, 33, 48]. Participants were recruited by a panel company in November 2020 and received survey points upon completing the questionnaire which could be redeemed for rewards. ‘Soft’ quotas were set to obtain a broadly representative sample of the Hungarian population in terms of age, gender, education, place of residence and geographical region [49]. The Research Ethics Committee of the Corvinus University of Budapest granted permission to conduct the survey (no. KRH/343/2020).

2.2 Survey Content and Outcome Measures

Respondents completed the Hungarian versions of EQ-5D-5L, PROMIS-29+2 v2.1 and SF-36v1 in a fixed order. The main characteristics of the descriptive systems and value sets of these PAMs are described in Online Resource 1. Sociodemographic (age, gender, education, place of residence, geographical region, employment, marital status and income) and health-related information (height, weight, self-perceived health, providing informal caregiving, exercising, smoking, alcohol consumption, prescription or over-the-counter medication use and the history of physician-diagnosed chronic conditions) were also collected. The respondents’ chronic health conditions were recorded in two steps. Firstly, respondents were asked to indicate any experienced chronic health conditions or chronic consequences of acute conditions in the last 12 months; then they were required to mark those that had been diagnosed by a physician. The list of health conditions was compiled on the basis of the European Health Interview Survey (EHIS) with the addition of some other conditions common in the general population [50]. Respondents were asked to estimate the amount of time they spend on sports or physical work each week in hours and minutes. The survey also included a question on the number of medications regularly taken. There were no missing data, as answering all questions was mandatory in the survey.

2.2.1 EQ-5D-5L

EQ-5D-5L is a generic PAM, consisting of a descriptive system and a visual analogue scale (EQ VAS) with endpoints of 0 (the worst health you can imagine) and 100 (the best health you can imagine) [7, 8]. The descriptive system has five domains of health (mobility, self-care, usual activities, pain/discomfort and anxiety/depression), each consisting of one item with five response levels on a severity scale (“no problems” to “unable to/extreme problems”), thereby describing a total of 3125 unique health profiles [8]. The measure asks respondents to recall their current health (“your health today”). In the present study, the Hungarian value set was used to compute utilities that had been developed using composite time trade-off method [51]. Utilities range from −0.848 to 1 (full health), where negative values represent health states considered being worse than dead.

2.2.2 PROPr

The PROMIS-preference scoring system (PROPr) is a generic PAM based on the PROMIS framework. In our survey, participants completed the PROMIS-29+2 v2.1, which is an extended version of the PROMIS-29 adult profile measure [52]. The PROMIS-29 descriptive system covers seven health domains, each consisting of four items with five response levels [physical function, anxiety, depression, fatigue, sleep disturbance, ability to participate in social roles and activities (hereafter social roles) and pain interference] and a 0–10 pain intensity numeric rating scale. Additionally, PROMIS-29+2 comprises an eighth cognitive function domain (Cognitive Function—Abilities v2.0), which involves two items. Response levels of each item vary across severity (“not at all” to “very much”), frequency (“never” to “always”), interference with functioning (“not at all” to “very much”), global rating (“very good” to “very poor”) and capability (“without difficulty” to “unable to”) format scales. Respondents are mostly asked to recall their health over 7 days, whereas the recall period is unspecified for physical function and social roles. Combining responses on seven PROMIS-29+2 health domains (all but anxiety) allows the calculation of PROPr utilities, defining a total of 217,238,121 unique health profiles [22]. In this study, the US PROPr value set was used (the only currently available PROPr value set), which had been developed in the US using the standard gamble method. Utilities range from − 0.022 to 0.954.

2.2.3 SF-6D

In this study, the Short-Form 6-Dimension (SF-6D) dimension scores and utilities were derived from the eight-dimensional SF-36v1 generic health status measure [13, 14]. The SF-6D combines 11 items of six SF-36 domains (physical functioning, role limitation, social functioning, pain, mental health and vitality). Thus, SF-6D comprises six domains, each represented by one item. These items have four to six response levels measuring severity (“no limitations” to “a lot of limitations”), frequency (“all of the time” to “none of the time”) or interference with functioning (e.g. “no pain” to “pain that interferes with one’s normal work extremely”). This descriptive system results in a total of 18,000 unique health profiles. Respondents recall their health over a 4-week period, except for the physical functioning domain, which asks about their current health (“now”). In the absence of a Hungarian value set, we used the SF-6D value set of the UK, developed using a standard gamble method [14]. The theoretical range of utilities is 0.301–1.

2.3 Statistical Analysis

Before the statistical analysis, data quality was assessed by the research team. Some inconsistencies were observed, indicating that certain EQ-5D-5L responses were inadvertently recorded as level 5 responses, which can be attributed to an error in the online survey interface. The research team attentively examined each level 5 response and compared them with other information provided by the respondents (e.g. self-reported health on other measures, health information and physician-diagnosed chronic health conditions). As a result, a total of 69 participants were excluded from the sample. Detailed information on the exclusion process can be found elsewhere [26].

Age was categorised into seven groups: 18–24, 25–34, 35–44, 45–54, 55–64, 65–74 and 75+ years [53]. Data on sports and physical work was dichotomized using a cut-off value of 150 min of weekly physical activity, based on the recommendation of the World Health Organization [54]. Responses on medication use were recoded into two categories: 1–4 types and 5 or more types per day (i.e. polypharmacy) [55]. Respondents were asked about their height and weight, on the basis of which their body mass index (BMI) was calculated, and they were grouped into four categories: < 18.5 kg/m2 (underweight), 18.5–24.9 kg/m2 (normal), 25–29.9 kg/m2 (overweight) and ≥ 30 kg/m2 (obese) [56].

All analyses were performed for the EQ-5D-5L, SF-6D and PROPr descriptive systems and the EQ VAS. Descriptive characteristics of the sample were computed. The relative frequency of responses to each domain of each questionnaire was calculated for the entire sample and then determined according to gender and age groups. Notably, for PROMIS-29+2, T-scores were not calculated; these are presented for each domain in a previous publication [27]. For all three measures, responses to each domain were dichotomized (“no problems” or “any problems”). Corresponding health domains were directly compared across the three measures (e.g. EQ-5D-5L mobility, PROPr physical function and SF-6D physical functioning). Pearson’s χ2 test was used to analyse the differences in the relative frequency of respondents with any problems among the corresponding domains of the three measures. The same test was employed to assess the differences between the responses of males and females, as well as across age groups within each domain of each measure. For each age group, the proportion of respondents in the best possible health state (i.e. no problems in any domain) was computed for all three instruments and the EQ VAS. For the latter, the maximum score of 100 represented the best possible health. This was also separately computed for males and females. Mean level scores (LS) were computed for each domain of each measure by transforming response levels to a 0–100 scale (e.g. EQ-5D-5L: level 1 = 0, level 2 = 25, level 3 = 50, level 4 = 75 and level 5 = 100), where a higher score denotes a worse health status [57]. Student’s t-test (two subgroups) or analysis of variance (three or more subgroups) was applied to test the differences between subgroup means.

Mean utilities and their 95% confidence intervals (CI) were estimated for the three instruments and EQ VAS in the total sample, on the basis of the sociodemographic characteristics and 30 chronic health condition groups reported by the respondents (e.g. hypertension, diabetes, musculoskeletal diseases, anxiety and depression). The differences between the mean utilities of these subgroups were examined with Student’s t-test or analysis of variance, where applicable.

Associations of sociodemographic and health-related characteristics of respondents with EQ VAS scores and EQ-5D-5L, PROPr and SF-6D utilities were explored with multivariate linear regression models. Heteroskedasticity was evaluated by the Breusch–Pagan test and corrected using robust standard errors. The models included sociodemographic and health-related characteristics with a sample size of at least 30 cases per subgroup, as independent variables. All independent variables were categorical.

All statistical analyses were carried out using R Statistical Software (version 4.3.0; R Foundation for Statistical Computing, Vienna, Austria). All statistics were two-sided, and the significance level was set at 0.05.

3 Results

3.1 Characteristics of the Study Population

The sociodemographic and health-related characteristics of the study sample are presented in Tables 1 and 2. The composition of the sample (n = 1631) closely approximated that of the Hungarian population regarding age, gender, education, employment, marital status, place of residence and geographical region. Nonetheless, there were small deviations; participants with secondary education or those aged 75 years or over were somewhat underrepresented, while those with a college/university degree were slightly overrepresented. More than two-thirds of the sample (67.4%) self-reported having a physician-diagnosed chronic health condition.

Table 1 Mean EQ-5D-5L, SF-6D and PROPr utilities and EQ VAS scores by age and gender groups
Table 2 Mean EQ VAS scores and EQ-5D-5L, PROPr and SF-6D utilities according to sociodemographic and health-related characteristics

3.2 Reported Health Problems by Domains

The distribution of the responses on the domains of each measure is presented in Online Resources 2–10, first for the total sample, and then separately for males and females.

Generally, the most commonly reported problem was pain/discomfort on the EQ-5D-5L (43.8%), sleep disturbance on the PROPr (93.8%) and vitality on the SF-6D (87.1%; Fig. 1). In contrast, respondents experienced the fewest problems in EQ-5D-5L self-care (7.5%), PROPr physical functioning (39.1%) and SF-6D role limitations (37.8%).

Fig. 1
figure 1

Proportion of respondents reporting problems in health domains of three preference-accompanied measures by gender. Pearson’s χ2 test was performed where a health domain was covered by more than one instrument. All corresponding domain groups where there was a significant difference between the relative frequency of the domain responses (p < 0.05) are marked with a. Pearson’s χ2 test was performed to assess the difference between genders in each health domain of all three instruments. All domains where there was a significant difference between the female and male subsample (p < 0.05) are marked with b. PROPr, Patient-Reported Outcomes Measurement Information System-preference scoring system; SF-6D, Short-Form 6-Dimension

With advancing age groups, problems tended to increase significantly in physical function, self-care, usual activities/role limitations and pain/discomfort for all measures (Fig. 2). For mental health domains in all measures, problems significantly decreased with age. No clear trend could be detected for SF-6D vitality, but at the same time, the difference between the age groups was statistically significant. Problems tended to decrease significantly for PROPr fatigue, and then suddenly rose in the oldest age group. PROPr cognitive function showed a significant U-shaped curve. No significant difference was observed for the PROPr sleep disturbance domain. For the social functioning/roles domains, after the 35–44-years age group, problems significantly increased for the PROPr, while problems tended to decrease with age for the SF-6D.

Fig. 2
figure 2

Proportion of respondents reporting problems in health domains of three preference-accompanied measures by age group. Pearson’s χ2 test was performed to assess the difference between age groups. All domains where p-values were < 0.05 are marked with a for EQ-5D-5L, b for PROPr and c for SF-6D. PROPr, Patient-Reported Outcomes Measurement Information System-preference scoring system; SF-6D, Short-Form 6-Dimension

Concerning the corresponding health domains, a noticeable trend was observed, with respondents indicating the most problems on the SF-6D, followed by PROPr, and the fewest problems on the EQ-5D-5L. The most problems in physical function were reported on SF-6D (57.1%), whereas 39.1% of the respondents marked any problems on PROPr and 29.6% on EQ-5D-5L. In usual activities/role limitations, more participants had problems on SF-6D (37.8%) compared to EQ-5D-5L (21.2%). Participants reported the most problems with pain/discomfort on SF-6D (66.1%), followed by PROPr (49.2%) and EQ-5D-5L (43.8%). The same order was observed in the area of mental health: almost three-quarters of respondents experienced mental health problems on the SF-6D (74.6%), while this was true for 55.9% on the PROPr and 33.9% on the EQ-5D-5L. As for fatigue/vitality, more problems occurred on the SF-6D (87.1%) than on the PROPr (74.7%). The only exception was social functioning/roles, where 61.1% of participants reported problems on PROPr, whereas only 41.8% on the SF-6D. Concerning other unique domains specific to each instrument, 7.5% reported any problems on EQ-5D-5L self-care, 93.8% on PROPr sleep disturbance and 63.5% on PROPr cognitive function. Out of a total of 18 domains of the three instruments, women had more problems than males in 16 domains (except for EQ-5D-5L mobility and self-care), 13 of which were statistically significant (Fig. 1).

Mean LS data are presented in Online Resources 11–12. When considering the corresponding domains, the trends were almost identical to those observed when comparing the proportion of problems across domains. Participants had significantly higher mean LS on SF-6D domains, followed by PROPr and EQ-5D-5L. Physical function was an exception, where SF-6D had the highest and PROPr the lowest mean LS. As for genders, in those domains, whereas females reported more problems, they also had a significantly higher mean LS.

3.3 Respondents Reporting the Best Possible Health

A total of 40.2% of the respondents had the best possible health on the EQ-5D-5L, 2.3% on the PROPr and 5.5% on the SF-6D. In the total sample, the proportion of respondents reporting the best possible health state slightly increased with the EQ-5D-5L between 18 and 44 years and started to decline steeply from the 45–54 age group (45.2%) onwards, having the lowest value in the 75+-year age group (20.0%; Fig. 3). In the case of SF-6D and EQ VAS, the proportion of respondents indicating the best possible health declined as age progressed, starting from 13.5% and 8.5% in the 18–24-year-old age group and decreasing to 1.8% and 3.9%, respectively. No substantial difference could be found between age groups in the proportion of respondents with the best possible health on PROPr, with 1.4% of the 18–24-year-old and 3.6% of the 75-year-old age group having the best possible health. Similar trends were observed when the results were stratified according to gender.

Fig. 3
figure 3

Proportion of respondents in the best possible health by age and gender groups. PROPr Patient-Reported Outcomes Measurement Information System-preference scoring system, SF-6D Short-Form 6-Dimension, EQ VAS EuroQol Visual Analogue Scale

3.4 Mean EQ VAS Scores and EQ-5D-5L, PROPr and SF-6D Utilities by Sociodemographic and Health-Related Characteristics

The mean EQ VAS score was 77.81 (95% CI 76.87–78.75) in the total sample, and the mean utility was 0.900 (95% CI 0.891–0.908) with the EQ-5D-5L, 0.535 (95% CI 0.523–0.547) with the PROPr and 0.755 (95% CI 0.748–0.762) with the SF-6D (Table 1). Males had significantly higher utilities with EQ-5D-5L, PROPr and SF-6D, while the difference between genders was insignificant with EQ VAS. In contrast, the difference between age groups was significant with EQ VAS and EQ-5D-5L, with older respondents having lower utilities, whereas no difference could be detected with PROPr and SF-6D. Values in age groups ranged between 71.87 (75+ years) and 81.23 (18–24 years) for the EQ VAS, 0.854 (75+ years) and 0.936 (18–24 years) for the EQ-5D-5L, 0.496 (75+ years) and 0.553 (65–74 years) for the PROPr and 0.727 (75+ years) and 0.770 (45–54 years) for the SF-6D. On average, females had lower mean utilities in all age groups using all measures. The difference between genders was statistically significant for none of the age groups on EQ VAS, the 18–24-, 35–44- and 45–54-year-old age groups on EQ-5D-5L, for all but two age groups on PROPr (18–24- and 35–44-year-olds) and on the SF-6D (18–24-year-old and 75+-year-old groups).

Having a higher level of education (all instruments), having a higher per capita net monthly income in their household (all), being married or widowed or being in a domestic partnership (PROPr), being a student, being employed (all), being a homemaker/housewife (EQ VAS, EQ-5D-5L and SF-6D), being retired (PROPr), having a better self-perceived health status (all) and never having smoked (EQ-5D-5L and SF-6D) were associated with better health (Table 2). Participants who had a history of chronic illness, did less than 150 min of physical activity weekly, took more medications regularly, and were underweight, overweight or obese had significantly lower utilities on all instruments, as did those living in villages (EQ-5D-5L, PROPr and SF-6D) or in Eastern Hungary (PROPr), and informal caregivers (PROPr and SF-6D). Although the difference between subgroups was significant in the case of alcohol consumption for all measures, no clear trend of the mean utilities could be detected.

The mean utilities for different chronic health conditions can be found in Table 3. Healthy respondents had the highest mean utility for all instruments. PROPr yielded the lowest mean utilities in all health conditions groups, while EQ-5D-5L yielded the highest in 28 out of 30 groups, except for liver cirrhosis and stroke, where mean SF-6D utilities were higher than mean EQ-5D-5L utilities. Participants with thyroid disease exhibited the highest mean EQ-5D-5L utilities (0.896) and EQ VAS scores (75.40), while those with hypertension had the highest mean PROPr (0.485) and SF-6D utilities (0.718). The lowest mean EQ-5D-5L and PROPr utilities were observed in those with liver cirrhosis (0.498 and 0.220, respectively), and the lowest mean EQ VAS score and SF-6D utility were noted in those having other mental health conditions (53.92 and 0.578, respectively).

Table 3 Mean EQ VAS scores and EQ-5D-5L, PROPr and SF-6D utilities according to chronic health conditions

3.5 Predictors of EQ VAS Scores and EQ-5D-5L, PROPr and SF-6D Utilities

Table 4 shows the results of the multivariate linear regression of EQ VAS scores and EQ-5D-5L, PROPr and SF-6D utilities. Females had significantly higher EQ VAS scores, but lower PROPr and SF-6D utilities, than males; in all else, the scores were equal. The 25–34-year-olds had lower utilities with the EQ-5D-5L and with the SF-6D than the 18–24-year-old age group; however, the 45–54-year-old, 55–64-year-old and 65–74-year-old age groups had significantly higher utilities than the youngest generation with PROPr. Having a lower level of education (EQ-5D-5L and PROPr), being unemployed (EQ-5D-5L, EQ VAS) or a disability pensioner (EQ-5D-5L), practising less than 150 min of weekly physical activities (all measures), taking five or more types of medication regularly (all measures), consuming alcohol daily (PROPr), and being underweight or obese (SF-6D) were associated with significantly lower values. Married respondents or those in a domestic partnership had higher utilities than those who were single (PROPr).

Table 4 Multivariate linear regression of EQ VAS scores and EQ-5D-5L, PROPr and SF-6D utilities

Out of 26 chronic health conditions, 10 were associated with significantly lower SF-6D utilities. The corresponding figures for EQ VAS score, PROPr and EQ-5D-5L utilities were 9, 9 and 4, respectively. Musculoskeletal diseases and mental health conditions other than anxiety and depression were the only two chronic health conditions significantly associated with lower values on all measures. Hyperlipidaemia, cancer (including leukaemia and lymphoma), headache, anxiety and depression were associated with lower values on three out of four measures. Other mental health conditions had the largest impact on the EQ VAS scores and EQ-5D-5L utilities (beta = − 9.657 and − 0.104), cancer (including leukaemia and lymphoma) on the PROPr utilities (beta = − 0.105) and musculoskeletal diseases on the SF-6D utilities (beta = − 0.065). These sociodemographic and health-related variables explained 28.50% of the variance of the EQ VAS, 39.46% of the EQ-5D-5L, 34.05% of the PROPr and 35.78% of the SF-6D values.

4 Discussion

This study has established population norms for the EQ-5D-5L, PROPr and SF-6D measures in Hungary. To our knowledge, this is the first study at an international level to simultaneously present EQ-5D-5L, PROPr and SF-6D population norms and provide health utilities for 30 chronic physical and mental health conditions on these three outcome measures. Nearly 60% of the respondents indicated health problems on the EQ-5D-5L, the most common being pain/discomfort. As for the PROPr and SF-6D, over 95% and 90% of the respondents reported some health problems, with sleep disturbance and vitality problems being the most frequent, respectively. Males had higher utilities on all measures. Interestingly, females showed significantly higher EQ VAS scores in the linear regression, and at the same time, indicated more problems in any health domains where the difference between genders was significant. Older respondents had significantly lower utilities with EQ-5D-5L, but the difference between age groups was insignificant with the PROPr and SF-6D. In addition to age and gender, several sociodemographic and health-related variables were associated with utilities, including level of education, employment, net income per capita, physical activity, medication use and BMI, depending on the instrument. A total of 15.4–42.3% of chronic health conditions groups were associated with the health utilities.

Our results concur with similar EQ-5D-5L population norm studies from surrounding countries (Bulgaria, Poland, Romania and Slovenia) all indicating a decreasing health with advancing age [37, 58,59,60]. Comparable to the Hungarian population (43.8%), populations of these countries also reported the most problems in the pain/discomfort domain, varying between 39.2% (Bulgaria) and 81.6% (Poland). The proportion of respondents experiencing the best possible health with EQ-5D-5L was slightly higher than in our study (40.2%), 50.3% (Romania) and 52.0% (Poland). Interestingly, in our study, older generations had relatively fewer problems in the anxiety/depression domain than their younger counterparts, which corroborates the findings of the Slovenian and Romanian studies. Conversely, the Bulgarian and Polish populations demonstrated an opposite trend. Contrary to our findings, females having lower average utilities than men was not observed evidently in other studies. Regarding SF-6D and PROPr, results from surrounding countries could not be compared with the results of this study, as they are not available. Consistent with other SF-6D population norms [35, 61,62,63,64,65,66,67], females had lower utilities than males. However, the association between utilities and age was generally inconsistent across SF-6D population norms. While health generally declined with advancing age in most studies [61,62,63,64,65,66], similarly to our study, Japan [35] and Hong Kong [67] displayed no clear trend. Similar to our results (87.1%), most problems were reported in vitality in other countries, ranging from 57.8% (Brazil) to 92.9% (Hong Kong). Other research comparing the EQ-5D-5L and SF-6D population norms [35, 36] found similar trends, including a higher proportion of any problems on the SF-6D than on the EQ-5D-5L and higher mean EQ-5D-5L utilities than SF-6D utilities [35, 36]. Nevertheless, it is important to acknowledge that the comparability of these results is limited due to the different modes of administration (e.g. face-to-face interview versus online panel) and the different value sets. It is also worthwhile to briefly compare our results with the recently published Hungarian population norm of the 15D generic PAM [31]. Although the data collection of the current study took place a year earlier, both studies used an online panel. The 15D population norms showed similar patterns to our results, including improving mental health (i.e. mental function, depression and distress domains) with advancing age.

A substantial proportion of respondents reported being in the best possible health on the EQ-5D-5L (40.2%). In contrast, merely 2.3% and 5.5% fell into the category of the best possible health on PROPr and the SF-6D. Across the corresponding health domains, the highest prevalence of problems was generally reported on the SF-6D, followed by PROPr, and the EQ-5D-5L. At the same time, PROPr demonstrated the lowest, whereas the EQ-5D-5L the highest health utilities in 28/30 chronic health conditions. These results may be attributed to differences in item/domain content, type of response scales, valuation methods for the utilities (composite time trade-off for the EQ-5D-5L and standard gamble for the PROPr and SF-6D), as well as the characteristics of the value sets. When considering item/domain content, the EQ-5D-5L is conceptualised around the absence of any problems, or “full health”. In contrast, SF-6D and PROPr domains are conceptualised around “positive health”, using positively worded frequency scales, such as SF-6D vitality (e.g. “a lot of energy all of the time”), as well as frequency labels indicating that certain problems are never experienced, such as PROPr anxiety (e.g. “I never felt fearful”, “I never felt uneasy”).

Considering the EQ-5D-5L’s wider utility range (− 0.848 to 1) compared with PROPr (−0.022–0.954) and SF-6D (0.301–1), one may initially anticipate a specific order of sample means: SF-6D > PROPr > EQ-5D. However, value set characteristics, such as the theoretical density distribution of values along the utility scale, also impact mean values [68]. Theoretical EQ-5D-5L and SF-6D values exhibit a symmetric distribution, with EQ-5D-5L having a wider range. In contrast, PROPr values are skewed, primarily falling between 0 and 0.5. In a general population sample, the right side of the utility scale is predominantly used. Consequently, EQ-5D-5L, with the highest number of theoretical values above 0.8, demonstrates the highest mean, while PROPr, with the majority of its theoretical values below 0.5, exhibits the lowest mean. Furthermore, since each value set was developed in different countries, and reflects the preferences of the population of the respective country, systematic differences may arise from variations in sociodemographic and economic characteristics, as well as cultural values [69].

Another important factor that may influence differences in responses is the recall period. Respondents are asked to recall their current health for the EQ-5D-5L, whereas for the PROPr, the recall period mostly spans 7 days, and the SF-6D uses an even longer recall period of 4 weeks. Previous research has shown that longer recall periods are associated with reporting more health problems [70], which is in line with our findings. In most health domains, respondents reported the most problems using SF-6D, followed by PROPr and EQ-5D-5L. Nonetheless, a longer recall period can lead to increased difficulties in remembering health problems [71,72,73], whereas shorter recall periods may result in the systematic underestimation of health problems [71, 74].

While the EQ-5D-5L assesses five different domains of health, the SF-6D measures six and PROPr covers seven, it is important to understand that there are certain health aspects that might not be fully captured by these PAMs. These aspects may have importance for specific patient groups or the general population, consequently limiting the instruments’ content validity in these populations. For instance, the EQ-5D-5L lacks a direct assessment of cognition, sleep, social relations and vitality. In contrast, PROPr does not incorporate the measurement of self-care and usual activities/role limitations, while there is no cognition or sleep domain on the SF-6D. The EQ-5D-5L tries to fill this gap by developing and adding extra items (‘bolt-ons’) to the descriptive system. Previous work from the Netherlands, Switzerland, South Korea and Malaysia suggests that including bolt-ons in the descriptive system reduces the ceiling in general population surveys [75,76,77,78]. In the future, these bolt-ons could contribute to providing a more comprehensive description of the population’s health status. Despite these advantages, the inclusion of bolt-ons may undermine the standardisation efforts of the instrument, impacting the comparability of cost-effectiveness estimates.

The choice of the instrument is highly dependent on the specifics of the study; therefore, decision-makers and users should take into account study objectives, population characteristics and the context of use. The EQ-5D-5L is the most widely used instrument, with the highest number of country-specific value sets available and robust psychometric properties in hundreds of studies [12], and is the preferred instrument in many national health technology assessment (HTA) guidelines [5, 79]. However, both the SF-6D and PROPr cover some health areas, which are not (or only partially) included in the EQ-5D-5L, potentially making them more suitable choices for specific populations, such as those with mental illnesses or sleep disturbances. PROPr is a new initiative, and more evidence is needed to establish its validity, reliability and responsiveness before considering it as a recommended instrument. Furthermore, several studies have raised questions about the validity of PROPr. These encompass issues with the validity of positively worded items (e.g. ‘Refreshing sleep’), the valuation methods and design employed in developing the value set, and face validity, particularly in relation to mean utilities around 0.50 (the midpoint of the QALY scale) in general population samples [26, 68, 80]. As for the SF-6D, it is important to note that we used the SF-36v1, enabling the estimation of SF-6Dv1 utilities. A new SF-6Dv2 has been developed more recently, addressing criticisms the previous version received, such as the somewhat confusing severity ordering of the physical functioning domain or the positively phrased vitality domain in comparison to the other domains [81]. The SF-6Dv2 has a DCE with duration-based value set for the UK in contrast to the standard gamble used for the SF-6Dv1 [82]. Nevertheless, in some studies across diverse populations, the SF-6Dv1 demonstrated comparable validity to the EQ-5D-5L [83,84,85,86]. Comparing the outcomes of cost–utility analyses using these instruments appears to be an important future research direction. The EQ-5D-5L, with its broader utility range, stronger construct validity and responsiveness of utilities, seems more suitable for HTA purposes than the SF-6Dv1 or PROPr [26, 87,88,89]. Another aspect to consider when selecting an instrument for a study is the number of items. The EQ-5D-5L includes 6 items (including the EQ VAS), while the PROPr requires at least 14 items, and the SF-6D has 12 (SF-12) or 36 (SF-36) items. In clinical trials, where batteries typically include multiple instruments, the inclusion of longer questionnaires can significantly impact patient burden, potentially resulting in a higher number of missing responses.

The results of this study must be considered in light of some limitations. First, collecting data from an online panel might be prone to selection bias, especially in the older generations, as they are less likely to have internet access or use the internet regularly [90,91,92]. Moreover, our sample included only a small number of respondents aged 75 years or older; thus, their representation was limited. Second, data were collected during the coronavirus disease 2019 (COVID-19) global pandemic that might have influenced participants’ responses. However, a pre-COVID large general population study showed similar responses on a 5-point excellent-to-poor health scale regarding self-perceived health status (first question of the SF-36) [93]. Third, due to the lack of a Hungarian value set for the PROPr and SF-6D, the US and UK value sets were used, which do not necessarily reflect the preferences of the Hungarian population. Fourth, the three instruments were administered in a fixed order. However, several studies have found that the order of instruments typically has only a marginal or small effect on the responses in longer surveys [94,95,96]. Fifth, the generalizability of the results related to some chronic health conditions is limited due to their relatively low prevalence among respondents. Lastly, 67.4% of the study sample self-reported having a physician-diagnosed chronic health condition, whereas according to the EHIS, only 48.0% of the Hungarian population suffered any chronic conditions [50]. This is likely down to our questionnaire providing a more detailed list of different health conditions. Future studies are warranted to administer paper-based surveys enabling a better representation of the least developed regions, marginalized populations and the elderly.

5 Conclusion

In summary, this study has developed Hungarian reference values for the EQ-5D-5L, PROPr and SF-6D measures, contributing to a better understanding of the health status of the population. Furthermore, as a result of the present study, there are currently population norms for overall five PAMs in Hungary (the EQ-5D-3L, the EQ-5D-5L, the 15D, PROPr and the SF-6D) [30, 31]. Across the three measures, the most problems were reported on the SF-6D, followed by PROPr and the EQ-5D-5L. Internationally, this has been the first study to present EQ-5D-5L, PROPr and SF-6D population norms simultaneously in any country. Therefore, while the results are specific to Hungary, they are expected to have relevance outside of Hungary as well.