FormalPara Key Points for Decision Makers

Our literature review suggests that factors representing cultural beliefs, religion, language, healthcare systems (typology, type of financing) and sociodemographics are associated with differences in preference-based health state valuations for the EQ-5D instruments.

Five clusters of European countries that are likely to share similar characteristics relevant to preference-based health state valuations were identified: English speaking, Nordic, Central-Western, Southern and Eastern European, for which supra-national value sets, i.e. combining a selection of country-specific value sets, were estimated.

The supra-national value sets could be used as a best approximate value set for countries that lack one as well as in multi-national trials and regional decision making to improve the comparability and transferability of outcome assessments in economic evaluations in Europe.

Our method of developing clusters of countries can be applied to other preference-based health-related quality-of-life instruments.

1 Introduction

The quality-adjusted life-year is calculated by multiplying the duration spent in a health state by the weight of health-related quality of life for this health state [1]. The value of the health-related quality of life can be estimated using questionnaires in those countries, which have already elicited a set of preference weights, often referred to as ‘value sets’ or ‘value tariffs’ [1]. The HUI, 15D, SF-6D and EQ-5D instruments (EQ-5D-3L/EQ-5D-5L and EQ-5D-Y-3L) are examples of such questionnaires, of which the EQ-5D-3L is the most frequently used instrument. The value sets for these measures are usually obtained from a sample of people representative of the country’s general population [2].

There are substantial differences between the existing value sets for the EQ-5D-3L/5L [3,4,5], which are often attributed to societal and cultural differences between countries [6,7,8,9], which justify the use of country-specific value sets [2]. However, some countries do not have their own national value sets, which hinders the conduct of robust economic evaluations in a national context and the cross-country comparability of measurement and valuation of outcomes. In addition, with regard to the ‘youth’ version of the EQ-5D, EQ-5D-Y-3L, only three value sets are currently available in Europe [10,11,12]. Of the 27 European Union member states plus Switzerland, Norway, Liechtenstein and Iceland, countries of the European Free Trade Association, and the UK, 16 do not have any valuation set for the 3L or 5L versions of the EQ-5D (Austria, Bulgaria, Croatia, Cyprus, Czech Republic, Estonia, Greece, Iceland, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Slovakia, Norway, Switzerland). The number of countries having a value set in this region is likely to change as, for instance, Norway is expected to have their first value sets soon [13]. It is currently common practice to use the UK value sets as they are often described as an “accepted approximation” [14], “recommended as most robust” [15] and “most commonly used” [16]. In some countries lacking national value sets, the health technology assessment guidelines for conducting cost-effectiveness and cost-utility analyses recommend to apply the UK value sets [13, 17]. Our own research conducted prior to this study showed that from 2010 to 2019 the most commonly used value sets in these countries were the UK or European visual analogue scale [18] value sets (Fig. A1 of the Electronic Supplementary Material [ESM]). However, there is a lack of evidence supporting the choice of one particular substitute value set as the best possible approximation of the target population. In addition to this, with the emerging number of new instruments, the need to borrow value sets across countries is likely to become a continually relevant topic.

To address the issue of the lack of value sets in some countries and explore the heterogeneity that might stem from cultural and contextual factors, we propose the development of supra-national value sets for the EQ-5D instruments. In addition to providing the best possible proxy value sets for countries that lack one, supra-national value sets are increasingly needed for comparability in multi-national studies and in regional procurement settings, for example for drug pricing and reimbursement. Many initiatives have emerged at the European level ensuring access to safe, effective, high-quality and affordable essential medicines [19]. Regional initiatives such as Beneluxa, the Velletta Declaration, the Balti Procurement Initiative and the Nordic Pharmaceutical Forum [19] aim to exchange strategic information and to help in joint negotiations in the context of drug reimbursement and pricing. These collaborations are likely to drive further cross-border collaborations and evidence-based decision making, such as jointly written health technology assessment reports.

Hence, the aim of this study was, first, to identify relevant factors attributed to differences in preference-based health state valuations with the EQ-5D in order to establish a conceptual framework for the development of supra-national value sets in Europe. The second objective was to estimate supra-national value sets for homogenous country clusters in Europe with respect to these contributing factors.

2 Methods

2.1 Development of a Conceptual Framework

The approach adopted in the development of the conceptual framework consisted of four steps. First, information from peer-reviewed literature was collected to identify factors influencing the EQ-5D instrument health state valuations. Second, the identified factors were assessed for their suitability to cluster development against pre-selected criteria. Next, the selection of grouping variables was made based on existing classifications and the countries were assigned into relevant categories within these grouping variables. Finally, clusters of homogenous countries were developed based on the frequency of their appearance in the same grouping category.

2.2 Literature Review

A targeted review of health economic literature relating to the EQ-5D instrument (EQ-5D-3L, EQ-5D-5L) health state valuations was conducted between October 2019 and December 2019 and updated in May 2020. The objective of the review was to identify which factors influenced differences in the EQ-5D instrument health state valuations across populations. Although the focus of this study was on the 27 European Union member states plus European Free Trade Association countries (Iceland, Liechtenstein, Norway and Switzerland) and the UK, the literature review was not restricted to European countries, neither was it limited to any languages or time period to allow a comprehensive assessment of all potential factors as per the exploratory character of this study. The search was undertaken in four databases: Embase, MEDLINE, Econlit, and the Social Sciences Citation Index. The search strategy is illustrated in Table A1 of the ESM. Articles were also obtained by searching reference lists of the identified studies and hand searching the EuroQol Group’s website (https://euroqol.org/) as part of the grey literature search. One researcher screened the titles and abstracts and then full texts of the identified studies. All studies that either empirically studied differences in how people valued the EQ-5D-3L/5L hypothetical states based on their country of origin or other characteristics (e.g. ethnicity, socio-economic status) or considered these aspects as part of their discussion were included as eligible. Extracted information on potential factors contributing to differences in preference-based valuations with the EQ-5D instruments was collated in a tabulated format, summarised and analysed in the next steps.

2.3 Assessment of Identified Factors for Their Suitability for Cluster Development

The identified factors were assessed for their suitability and relevance in generating groups of comparable countries for the development of supra-national value sets. The assessment was made based on a modified list of criteria originally proposed by Carinci et al. to score health system performance indicators for international comparisons [20]. Four out of six originally proposed criteria were used and adapted for the purpose of this study: (1) validity; (2) reliability; (3) international feasibility; and (4) international comparability (Table A2 of the ESM). The two criteria of ‘Relevance’ and ‘Actionability’ were not applied as they referred to aspects of clinical relevance and quality of care, which were deemed not applicable for this study [20]. In the current context, a factor was considered valid, if empirical scientific evidence was found to support the link between the factor and the variations in the EQ-5D-3L/5L health state valuations. Factors were considered reliable, if their assigned country-specific values represented stable phenomena and were not subjected to frequent changes over time. For instance, examples of factors that would not fulfil these criteria could be unemployment rates. International feasibility meant that values of a factor could be easily derived for international comparisons, while international comparability meant that the definition of the factor is uniform across countries.

The assessment of identified factors was made by the authors following the information extracted from the literature. A three-point scoring was applied per criterion depending on if the variable (1) met the criterion (1 point), (2) partially met the criterion (0.5 point) or (3) did not meet the criterion at all (0 point). Those factors that met all four criteria fully (1 point) or partly (0.5 point) and reached a sum of more than 2, were chosen for the further development of grouping variables.

2.3.1 Categorisation of Countries Based on Final Grouping Variables

The aim was to categorise countries into groups based on factors that met the criteria. Grouping variables that reflected these factors needed to be operationalised. We adopted a process of a non-systematic literature search to identify practical classifications that matched the selected factors and could be used as grouping variables. The search was conducted in PubMed and Google without time restrictions using keywords representing the factors (e.g. religion) and possible practical classifications (exemplary keywords: “classification”, “categorisation”, “grouping”, “division”, “classifying system”). After selecting relevant classifications as grouping variables, countries were assigned to different categories.

2.3.2 Cluster Development

The frequency with which pairs of countries within the same category appeared in a particular grouping variable was counted and depicted graphically. Based on these frequencies, clusters of countries were derived. To account for the uncertainty around the selection of grouping variables, different scenarios for deriving supra-national clusters were applied, each time omitting one of the grouping variables to observe the impact on the resulting clusters.

2.4 Development of Supra-National Value Sets

Supra-national value sets combined country-specific value sets derived using the same methodology, i.e. time trade-off (TTO) for the EQ-5D-3L (France, Germany, The Netherlands, Italy, Portugal, Spain, Hungary, Poland, Romania, Slovenia, UK, Denmark), and standardised EQ-VT protocol for the EQ-5D-5L (France, Germany, The Netherlands, Italy, Portugal, Spain, Hungary, Poland, Denmark, Ireland). Methodological aspects of the EQ-5D-3L and EQ-5D-5L valuation studies were compared in Tables A3 and A4 of the ESM.

Coefficients from published national valuation studies were used to derive ‘saturated value sets’ for both the EQ-5D-3L and the EQ-5D-5L using the methodology developed by Sajjad et al. [21]. Saturated value sets contained utility values for all 243 (35) or 3125 (55) theoretical health states for each country included in the supra-national cluster for the EQ-5D-3L and EQ-5D-5L, respectively [22].

Several approaches to model these simulated data were tested including ordinary least-squares, tobit, generalised linear model with gamma log-link and Finite mixture models as well as models with interaction terms (N3, D1, I2, I32). Goodness-of-fit statistics were compared including the Bayesian information criterion, Akaike information criterion, root mean square error and pseudo-R2. The ordinary least-squares model was the best fitting and most pragmatic. Hence, the ordinary least-squares regression analysis was used to estimate the pooled cluster value sets. The dependent variable was the utility value for all 243 or 3125 health states for each country for the EQ-5D-3L and EQ-5D-5L, respectively, with the best health state having the upper bound at 1 and 0 being dead. The regressors were constructed as dummy variables to model the shift between the five and three levels of the EQ-5D-5L and EQ-5D-3L descriptive systems within each of the five dimensions, respectively. Thus, for the EQ-5D-5L, four dummy variables were constructed for the Mobility dimension (MO): one measuring the shift between level 1 and level 2 (MO2); one measuring the shift between level 2 and level 3 (MO3), one measuring the shift between level 3 and level 4 (MO4); and one measuring the shift between level 4 and level 5 (MO5). Similar dummy variables were constructed for the dimensions of: self-care (SC2, SC3, SC4, SC5); usual activities (UA2, UA3, UA4, UA5); pain/discomfort (PD2, PD3, PD4, PD5) and anxiety/depression (AD2, AD3, AD4, AD5). Similarly, for the EQ-5D-3L, two dummy variables were constructed for the Mobility dimension: one measuring the shift between level 1 and level 2 (MO2) and one measuring the shift between level 2 and level 3 (MO3). Again, similar dummy variables were constructed for the dimensions of: self-care (SC2, SC3); usual activities (UA2, UA3); pain/discomfort (PD2, PD3) and anxiety/depression (AD2, AD3).

3 Results

3.1 Literature Review

Searches of the databases generated 881 references which, after de-duplication, amounted to 506 unique records. An additional 38 potentially relevant studies were retrieved from the reference lists of identified studies and four additional studies were obtained through a grey literature search. Following the screening of titles and abstracts and then the full texts of potentially eligible studies, 69 articles were included for data extraction (Fig. A2 of the ESM). We found 31 empirical studies that had an explicit aim to explore differences in how people value different health states based on their country of origin or other characteristics, for example, ethnicity and socio-economic status. The remaining studies (n = 38) included comparisons of value sets, conceptual/methodological articles, and articles exploring how valuation tariffs impact quality-adjusted life-years (Table A5 of the ESM).

3.2 Assessment of the Identified Factors and Proposed Grouping Variables

Ten factors that contribute to differences in the EQ-5D-3L/5L value sets were extracted: cultural differences, language differences/translation issues, methodological differences of the value set development, healthcare system differences (healthcare system typology, financing system), economic differences, sociodemographic differences, religion, racial/ethnic differences, geographical proximity and environmental differences (Fig. 1). The majority of the identified studies (70%) were concerned with the 3L version of the EQ-5D. At the time of the review, no value set for the EQ-5D-Y-3L was available.

Fig. 1
figure 1

Number of studies mentioning possible factors influencing cross-country differences in EQ-5D valuations (n = 69). HCS healthcare system

Following the assessment of the identified factors for their relevance in cluster development, the variables that reached the cut-off score of 2 and had no “0” values were: cultural differences, religion, language, healthcare system typology, healthcare system financing and sociodemographic aspects (Table 1). The variables ethnicity, economic status, geographical proximity and environmental factors were excluded because of their potential weaknesses with respect to validity (i.e. no concrete evidence was found that these variables influence health state valuations) and reliability (i.e. they might not represent a stable phenomenon over time).

Table 1 Assessment of factors associated with differences in valuation of health states

3.3 Culture and Religion: Grouping Variable 1

Several studies highlight that those countries that are culturally alike are also likely to value health more similarly compared to countries with substantially varying cultural backgrounds [23,24,25,26,27,28,29]. Studies that were reviewed in the literature search showed that using Hoftede’s definition of culture produced mixed results with respect to health state valuations [23,24,25], suggesting that a different definition of culture/cultural beliefs should be adopted. Therefore, in this study, the work of Huntingdon [30] and Inglehart and Baker [31] was used to define the first grouping variable. Following their thesis that the cultural heritage of a society is shaped by religious traditions, eight “cultural zones” were established, of which five were relevant for Europe: English-speaking, Protestant Europe, Catholic Europe, Baltic and Orthodox. Inglehart and Baker added one additional “Ex-communist” zone (Table 2). This extended classification was considered most accurate in the context of this study, which included former communist states. Our review also showed that religious aspects are important in the context of health state valuations [32,33,34,35].

Table 2 Proposed grouping variables for European Union, European Free Trade Association countries and the UK

3.4 Language: Grouping Variable 2

The literature search showed that linguistic variations influenced how respondents interpreted and valued health states [26, 35,36,37,38,39,40]. Studies found, for instance, that labels used in the Likert scale of the EQ-5D may be interpreted differently in different languages [37] or the nuances in the wording, especially the labels of the levels, might be different in different languages [35, 36]. There are over 140 languages present in Europe [41]. Indo-European languages are most often spoken such as Romance, Germanic, Slavic, Baltic and Greek languages. Other languages include, for example, West-Central semitic (e.g. Maltese) or Uralic (e.g. Finnish, Estonian, Hungarian) [42]. Table 2 shows the distribution of languages spoken in Europe upon which the second grouping variable was determined.

3.5 Healthcare System Typology: Grouping Variable 3

Studies have shown that the organisation of healthcare systems, including social support systems, could be a factor associated with assigning different preference values to given health states in valuation studies [33, 43,44,45]. For instance, Devlin et al. reported that some respondents’ valuations of hypothetical states are contingent on access to appropriate care and support for the person in that state [33]. Healthcare system typology was considered as the next relevant and suitable grouping variable in this study. A recent study by Ferreira et al. proposed a new classification of the healthcare systems in the European Union. Their investigation was based on methods including factor and cluster analyses to identify relevant healthcare system type categories [46] that were adopted in this study (Table 2).

3.6 Healthcare System Financing: Grouping Variable 4

The fourth grouping variable reflected the healthcare system classification of the OECD countries described by Böhm et al. and Wendt et al. [47, 48] (Table 2). This classification was extracted from the publication of Ferreira et al. [46] who provided an overview of existing healthcare system typologies. The classification distinguished dimensions of healthcare systems with respect to regulation, financing and service provision by three types of actors, namely state, societal and private. Healthcare system financing was included as an independent variable because it might play an important role in assigning preference values to given health states. For instance, a study showed that Singaporean Chinese had greater disutility for very poor health states compared with mainland Chinese [44]. The authors explained that in mainland China, a substantial percentage of health expenses are covered by health insurance in contrast to Singaporean participants who commented that they would prefer to die rather than become a burden to their families [44].

3.7 Sociodemographic Factors: Grouping Variable 5

The last grouping variable was based on a premise that sociodemographic aspects are associated with differences in preferences for different health states, including poverty status [49], geographical country region [49, 50], study site (urban vs rural areas) [45, 50], level of education attained [27, 49], marital status [50, 51], sex [52] and age [51, 52]. According to Dolan, age, marital status and sex emerge as three of the most important factors that might explain TTO values [53]. The fifth grouping variable was derived based on a study by Palevičien and Dumčiuvienė that used 25 regional indicators to identify clusters of countries with sociodemographic similarity within the European Union [54]. Because of the fact that some countries relevant to this study were not included in the original classification derived from Palevičien and Dumčiuvienė, these countries were assigned to the respective categories based on the similarities identified in another classification provided by Figueras et al. [55] and Genova [56] (extracted from the publication of Ferreira et al. [46]) [Table 2].

3.8 Selection of Clusters

The frequency with which pairs of countries have been grouped in the same category within a respective grouping variable is shown in Table A6 of the ESM. Figure 2 presents the links between all pairs of countries that were assigned to the same category three, four or five times. The thickness and colour of the connecting line between the countries depend on the number of times these countries were assigned to the same category. If pairs of countries appeared in the same category twice or only once, these interactions are not shown for simplicity. For example, Sweden and Denmark appeared in the same category on five out of five occasions within the respective country cluster, which is represented by the thickest black line. The analysis of these frequencies revealed five cohesive clusters: English speaking, Nordic, Central-Western, Southern and Eastern European. The Central-Western cluster currently combines the value sets from Germany, France and The Netherlands both for the EQ-5D-3L and EQ-5D-5L. The Southern cluster consists of the value set from Portugal, Spain and Italy for the EQ-5D-3L and EQ-5D-5L. The Eastern European cluster consists of the value sets from Poland, Hungary, Romania and Slovenia for the EQ-5D-3L and from Poland and Hungary for the EQ-5D-5L. The UK value sets for the EQ-5D-3L and Irish value set for the EQ-5D-5L are available in the English-speaking cluster. The Nordic cluster has currently one value set from Denmark for the EQ-5D-3L and EQ-5D-5L (Table 3).

Fig. 2
figure 2

Identified country clusters for the development of supra-national value sets. Figure 2 presents links between all pairs of countries that were assigned to the same category within grouping variables presented in Table 2 three times (light grey), four times (dark grey) or five times (black). 3L EQ-5D-3L, 5L EQ-5D-5L, AUT Austria, BEL Belgium, BUL Bulgaria, CRO Croatia, CYP Cyprus, CZE Czech Republic, DEN Denmark, EST Estonia, FIN Finland, FRA France, GER Germany, GRE Greece, HUN Hungary, ICE Iceland, IRE Ireland, ITA Italy, LAT Latvia, LIT Lithuania, LUX Luxembourg, MAL Malta, NED The Netherlands, NOR Norway, POL Poland, POR Portugal, ROM Romania, SLO Slovakia, SLV Slovenia, SPA Spain, SWE Sweden, SWI Switzerland, TTO time trade-off

Table 3 Proposed supra-national clusters

3.9 Sensitivity Analysis of the Cluster Selection

Six scenarios for the sensitivity analysis were analysed to observe the impact of selected grouping variables on the proposed clusters (Figs. A3–A8 of the ESM): in Scenarios 1–5, one of the grouping variables was omitted and the frequency of country links re-calculated. In Scenario 6, countries that were not included in the original classification systems of the given grouping variables (marked in italics in Table 2), were removed.

The analysis revealed that the (sub)cluster consisting of Cyprus and Greece was most affected when one of the grouping variables was removed from the frequency analysis. In three out of five sensitivity analysis scenarios, these two countries became a separate cluster not connected to the Southern cluster. In three out of five scenarios, Liechtenstein was separated from the Western cluster and Croatia from the Eastern European cluster. In one scenario, Croatia and Slovenia as well as Latvia and Lithuania created separate two-country clusters. Next to these differences, the proposed clusters remained unchanged irrespective of the scenario applied.

3.10 Supra-National Value Sets

All EQ-5D-3L TTO value sets included in the development of supra-national value sets with the exception of Hungary and Romania followed the Measurement and Valuation of Health protocol, which was developed in the UK to elicit health state preferences from the EQ-5D using the TTO method [57]. Most of the countries modified the Measurement and Valuation of Health protocol especially with respect to the number of health states valued directly by the respondents and the number of health states valued by each respondent. In Romania and Hungary, only three health states were valued by each respondent. With respect to the EQ-5D-5L, the differences between the methodologies of the valuation studies were less notable. Italy differed from other studies with respect to the mode of administration and used videoconferencing because of the coronavirus disease 2019 pandemic. Two countries, The Netherlands and Hungary, estimated their value sets using only the TTO data, while the remaining countries opted for the hybrid model.

Supra-national value sets for three clusters (Central-Western, Southern, Eastern-European) are presented in Tables 4 and 5. For the two remaining clusters where currently a value set from only one country is available (English speaking, Nordic), this given value set may be recommended as the best proxy to be used by other countries within the same cluster. In the calculated supra-national value sets, the constant is interpreted as the utility decrement associated with any deviation from full health. Whenever the constant is different than 1 (all supra-national value sets except the EQ-5D-5L from the Southern and Eastern European clusters), it should be used in the calculation of the EQ-5D index. For instance, the EQ-5D-3L index for the Central-Western cluster for the health state 33333 can be calculated using the following formula: 33333 = 1 − 0.183 (1 − constant) − 0.325 − 0.255 − 0.119 − 0.340 − 0.236.

Table 4 Supra-national value sets for the EQ-5D-3L
Table 5 Supra-national value sets for the EQ-5D-5L

3.11 Comparison of National and Supra-National Value Sets

The analysis of differences between the national value sets included in the study as well as supra-national value sets is provided in Tables A7–A9 of the ESM.

3.11.1 National Value Sets

The MO dimension was given the most importance (in terms of disutility for the worst level within a dimension) in the majority of the EQ-5D-3L value sets with the exception of The Netherlands, Poland and the UK, while PD was ordered as most important in most of the countries for the EQ-5D-5L. The number of health states worse than death varied from 21 (Italy) to 91 (Spain) for the EQ-5D-3L and from 206 (Poland) to 1124 (Ireland) for the EQ-5D-5L. Across all EQ-5D-5L value sets, the largest differences in the disutility values were assigned to the AD dimension. The loss of utility in level three of AD (AD3) was six times lower for the Irish respondents (disutility of − 0.202) than for the Polish respondents (disutility of − 0.029), which confirms that these two countries should be assigned to different clusters. The next largest relative differences were observed between Poland (− 0.018) and Spain (− 0.081) in AD2, Poland (− 0.108) and Ireland (− 0.535) in AD5 and France (− 0.022) and Spain (− 0.075) in AD5.

3.11.2 Supra-National Value Sets

For the EQ-5D-3L supra-national value sets, the number of health states worse than death ranged from 40 in the Eastern European cluster to 84 in the English-speaking cluster. The most important dimensions in terms of the lowest utilities assigned to level 3 were PD in the CW and English-speaking clusters and MO in the remaining clusters. The UA dimension was the least important in all clusters with the exception of the Southern cluster. AD was more important in the English-speaking and Nordic clusters compared with all remaining clusters, confirming that there are differences between the regions.

With respect to the EQ-5D-5L supra-national value sets, the number of health states worse than death ranged from 308 in the Southern cluster to 1124 in the English-speaking cluster. PD was the most important dimension in the CW, Southern and EE clusters, while AD had the highest significance in the English-speaking and Nordic clusters. UA was determined to be the least important in all EQ-5D-5L clusters.

4 Discussion

This work is part of the PECUNIA project that aimed at developing harmonised methods for a cost and outcome assessment for multi-sectoral, multi-national, multi-person economic evaluations in Europe [58]. The study contributed to that aim by improving the comparability and transferability of the outcome assessment in economic evaluations considering issues with availability and variability of the EQ-5D-3L/5L value sets across European countries. On the basis of the existing literature, five country clusters cohesive in terms of potential factors influencing the preference-based health state valuation for the EQ-5D instruments were proposed, and the resulting supra-national value sets were presented.

In addition to deriving clusters for the supra-national value sets, this study contributes to understanding the rationale behind using different substitute value sets in countries without one. Currently, the common practice is to use the UK value sets. However, there is a lack of evidence that would support this particular substitution, as the best possible approximation for the target population. For the clusters where more than one country-specific EQ-5D value set is available (Central-Western, Southern and Eastern European), a supra-national value could be used as the best proxy, if no national value set exists.

Another advantage of supra-national value sets is their potential usefulness in multi-national trials conducted within a given European region, and when regional reimbursement and drug pricing negotiations are considered. In multi-national economic evaluations, researchers tend to use a single or a limited number of value sets that are applied for all participating countries to derive utility values for quality-adjusted life-years (e.g. [59]). Using a single value set relevant to the region could increase the international comparability of economic evaluations across the included countries, while at the same time provide results that are in line with population health preferences. In the situation when all necessary value sets are available, their use should have priority before a combined value set.

Furthermore, the developed conceptual framework and the findings of the underlying literature review are not seen as limited to the European region and could be used for the future development of supra-national value sets in other regions of the world. For instance, several Asian countries have developed value sets for the EQ-5D instruments with comparable heterogeneity observed as in Europe [60]. A similar methodological approach could be also applied to other outcome measures, for example, the SF-6D and HUI, which have country-specific preference-based value sets available. A similar investigation in other regions should also include testing of other factors that could be relevant for the selected countries but were not included in the current study, for example, the regional differences in gross domestic product per capita.

Some limitations of the proposed approach need to be considered. The underlying assumption of this study is that countries grouped based on their broad characteristics for factors influencing a health state valuation should be fairly similar in terms of actual preferences. However, some studies showed that even when adjusting for these different factors, heterogeneity in preferences for health states remains [8, 25, 61,62,63]. In addition, there might be methodological variations in the sampling, the field administration and subsequent modelling of different national value sets that cause heterogeneity beyond the assessed country characteristics. Our investigation showed that the variations in the methodology of derivation of the national value sets were larger for the EQ-5D-3L valuation studies as compared with the EQ-5D-5L valuations. The EQ-5D-5L valuations followed a more strict protocol and a standardised independent quality-control system [64]. In the current study, we did not control for the methodological differences between the value sets. However, the value sets combined in the supra-national value sets used the same methodology of deriving health preferences, namely TTO or EQ-VT. Another limitation is that the differences in terms of the relevant factors for the EQ-5D-3L and EQ-5D-5L were not examined separately in this study and the literature review conducted to identify factors did not follow a rigorous protocol of a systematic literature review (i.e. only one author conducted the screening procedure). Finally, the applied grouping variables and their historical categories may not reflect some ongoing dynamic demographic changes in Europe and will need to be revisited in the future. Previous studies noted that ethnicity and migrant/native status are related to differences in the utility valuations assigned to different health states [65,66,67]. However, in this study, ethnicity was not a feasible variable to group countries into categories. Future research should address this issue and provide guidelines on how to assess health state preferences in heterogeneous populations with consideration of ethnicity and migration status.

5 Conclusions

The European countries were clustered on the basis of variables contributing to differences in preference-based valuations with the EQ-5D instruments and supra-national value sets were estimated for these clusters. The supra-national value sets can be used when a national value set is not available and/or for regional decision making.