FormalPara Key Points for Decision Makers

Indonesia does not have an EQ-5D value set.

An EQ-5D-5L value set was derived from a highly representative sample of the Indonesian general population.

Data were collected using a rigorous quality control procedure which led to logical and significant models.

This Indonesian EQ-5D-5L value set is now becoming available and will be used by all health economic evaluations and health-related quality-of-life studies in Indonesia that use EQ-5D.

1 Introduction

The Indonesian government wishes to improve equal access to healthcare by introducing universal health insurance. To ensure health technology assessment (HTA) can be undertaken for such an insurance scheme, Indonesia intends to employ cost-effectiveness analysis for new and existing medical interventions. To value the outcomes of a medical intervention in quality-adjusted life-years (QALYs) requires a quality-of-life instrument that can value the health states of patients using societal preferences, such as the EQ-5D instrument. At present, no Indonesian EQ-5D value set is available for the calculation of QALYs. There exists a standardized valuation protocol for the 5-level version of EQ-5D. We employed this protocol with over 1000 respondents representative of the Indonesian population. Below we describe in more detail (1) the social, economic and organizational HTA setting that determines the demand and specifications for an Indonesian valuation study; (2) a brief introduction to the EQ-5D-5L, its valuation protocol and the place of the EQ-5D in HTA; and (3) why we cannot rely on values set from European countries and/or neighbouring countries.

Indonesia is located in South East Asia, with 255.5 million inhabitants in 2015 [1]. Commencing in January 2014, Indonesia has implemented universal healthcare coverage organized by the ‘Badan Penyelenggara Jaminan Sosial Kesehatan’ or BPJS Kesehatan: the Healthcare and Social Security Agency. The aim of the BPJS Kesehatan is to include all Indonesian citizens in the National Health Insurance system to enable them to obtain access to healthcare benefits and to provide protection with respect to basic health needs [2]. The decision-making process related to the implementation of this national health coverage and the adoption of new technologies can benefit from an evidence-based strategy and the application of HTA, a decision-making process involving economic evaluation and other considerations such as those of an ethical and organizational nature, to ensure the optimal use of health technologies for the population. In 2015, the Ministry of Health of Indonesia formed a national HTA committee (Komite Penilaian Teknologi Kesehatan). The committee’s expected output is a policy recommendation to the Minister on the feasibility of the health service(s) to be included in the National Health Insurance benefit package [3, 4].

Economic evaluation uses clinical evidence to provide systematic consideration of the effects of all available alternatives regarding health, healthcare costs, and other effects regarded as valuable [5]. Cost-utility analysis (CUA) is used to evaluate health-related quality-of-life (HRQOL) outcomes and to compare costs and outcomes between different healthcare programmes in terms of cost per QALY [5, 6]. A QALY is obtained by integrating a health state utilities function, measured by multi-attribute utility instruments (MAUIs), differentiated over a lifetime. The three most widely used MAUIs are the EQ-5D, the Health Utility Index (HUI), and the Short Form 6D (SF-6D) [5,6,7,8]. Several national HTA organizations, for example in the UK and Thailand, have recommended EQ-5D as the preferred method for deriving utilities [9, 10]. Developed by the EuroQol Group, EQ-5D is a standardized generic instrument that collects descriptive HRQOL data on five dimensions: mobility, self-care, usual activities, pain/discomfort, anxiety/depression); followed by a self-rating of overall health status on a visual analogue scale (EQ VAS) ranging from 0 (‘worst imaginable health state’) to 100 (‘best imaginable health state’) [11, 12]. In 2011, the EuroQol Group expanded the levels of severity of the classic version of EQ-5D, renamed EQ-5D-3L, from three to five levels. This new instrument is designated ‘EQ-5D-5L’ [12]. Recent studies have shown that EQ-5D-5L produces a richer description of health states, a higher discriminatory power, and a lower ceiling effect compared with EQ-5D-3L [13,14,15,16,17,18]. The EuroQol Group has also developed a valuation protocol for EQ-5D-5L [19], and the EuroQol Group Valuation Technology (EQ-VT) template computerized the interview method to standardize EQ-5D-5L valuation studies in different countries. This protocol provides a value set for the calculation of QALYs using a societal perspective, the preferred perspective in health economics [5].

Indonesia does not have an EQ-5D value set, either for the 3-level or for the new 5-level version. Previous EQ-5D studies conducted in Indonesia measured health preferences using the Malaysian value set or values derived from citizens of the UK [20, 21]. However, for a value set to be valid for Indonesia it should represent the culture and living standards of Indonesia [22]. Moreover, the values should match the particular wording of the Indonesian instrument: for instance, if ‘cukup’ (i.e. ‘moderate’) is less worse in Bahasa Indonesia than in the Malaysian language (‘sederhana’) or in English, then the values should match that difference. For these reasons the aim of our study was to obtain preferences from the general population in order to derive a national EQ-5D-5L value set for the calculation of QALYs from a societal, Indonesian perspective.

2 Methods

2.1 Respondents

A representative sample was recruited from the Indonesian general population, with a minimum of 1000 respondents aged 17 and over, based on the work of Ramos-Goñi et al., to obtain a 0.01 standard error (SE) of the observed mean composite time trade-off (C-TTO), 9735 C-TTO responses were needed. Therefore, the 1000 respondents interviewed provided 10,000 C-TTO and 7000 discrete choice responses to estimate the models [23]. The adult population was defined as aged 17 and over, because in Indonesia, the legal age to obtain an ID card, a driving license, and access to voting is 17. To ensure the representativeness of the final sample for the Indonesian general population, we used a multi-stage stratified quota method with respect to residence (urban/rural, as registered by the official national register); gender (male/female); age (17–30/31–50/>50 years); and level of education: basic (primary school and below), middle (primary school plus at least 1 year of high school) and high (all others). This resulted in the first stage of 36 quota groups. Two other categories, religion (Islam/Christian/Others) and ethnicity (own-declared ethnicity: Jawa/Sunda/Sumatera/Sulawesi/Madura-Bali/Others), were considered important as well. However, including them in the same way as residence, gender, age, and education would result in 36 × 3 × 6 = 3888 quota groups. We therefore used religion and ethnicity quotas independently from the other factors. So religion and ethnicity are representative over the whole sample, but within the individual 36 quota groups this might not be the case. To take account of this second layer of sampling, we called this a ‘multi-stage stratified quota’. The predefined quotas were based on updated data from the Indonesian Bureau of Statistics [1].

We designed and used an online tool to ensure that the recruitment of respondents was in accordance with predefined quotas while the sampling was employed in different parts of the country. Interviews were conducted in the following six cities and their surroundings, located in different parts of Indonesia: Jakarta, Bandung, Jogjakarta, Surabaya, Medan, and Makassar. Respondents were recruited through a mixed strategy, i.e. through personal contact, local leader assistance, and from public places such as mosques and shopping streets. We also asked respondents to introduce us to other potential respondents. Interviews were conducted at the respondents’ or interviewers’ homes. For their participation, all respondents received a mug or a t-shirt specifically designed for the valuation study. Informed consent was obtained from all respondents included in the study. The study was approved by the Health Research Ethics Committee, Faculty of Medicine, Padjadjaran University, Indonesia.

2.2 Instruments

2.2.1 EQ-5D-5L

We used the official EQ-5D-5L Bahasa Indonesia version provided by the EuroQol Group. This translation of EQ-5D-5L was produced using a standardized translation protocol that followed international recommendations [24]. As briefly mentioned in the introduction, EQ-5D-5L consists of five dimensions: mobility (MO), self-care (SC), usual activities (UA), pain/discomfort (PD), and anxiety/depression (AD). Each dimension has five levels: no problems, slight problems, moderate problems, severe problems, and unable/ extreme problems [12]. The EQ-5D-5L instrument describes 3125 (55) unique health states. A 1-digit number expresses the level selected for that specific dimension. Hence, combining a 5-digit number for five dimensions will describe a specific health state. For example, state ‘11111’ indicates ‘no problems on any of the five dimensions’, while state ‘54321’ indicates ‘unable to walk about, severe problems washing or dressing, moderate problems doing usual activities, slight pain or discomfort, and no anxiety or depression’ [12]. Each health state has a so-called ‘sum score of the level digits’, which means the sum of the levels across domains; for example, ‘11111’ sum score of the level digits is 5 and ‘54321’ is 15. This EQ-5D descriptive system is followed by self-rating of overall health status on a visual analogue scale (EQ VAS) ranging from 0 (‘worst health you can imagine’) to 100 (‘best health you can imagine’).

2.2.2 Valuation Protocol

The EQ-5D-5L valuation protocol consists of five sections [19]:

  1. 1.

    A general welcome, where the interviewer explains the objectives of the research, followed by filling in the informed consent when the individuals agree to participate.

  2. 2.

    Introduction to and completion of the descriptive system, VAS and background questions (age, sex, experience of illness, religion, ethnicity and education).

  3. 3.

    C-TTO (see Sect. 2.2.3 below) tasks followed by a ‘Feedback Module’ task. Each respondent has to complete one example (health state: being in a wheelchair), three practice health states (mild: ‘21121’; severe: ‘35554’; and moderate but difficult to imagine: ‘15411’) and ten ‘real’ C-TTO tasks valuing hypothetical EQ-5D-5L health states. In the Feedback Module task, the respondents check whether they agree with the order of the health states they valued before. The EQ-VT screen shows health states for 10 C-TTO tasks arranged based on their value given by the respondents: from the lowest value at the bottom to the highest value at the top. Respondents are allowed to ‘flag’ the health state(s) for which they do not agree with the previously given relative position to other health states, but they are not allowed to alter their initial values. Three debriefing questions regarding the difficulties of the C-TTO tasks are added at the end of this section.

  4. 4.

    A discrete choice experiment (DCE, see Sect. 2.2.3 below) followed by three debriefing questions regarding the DCE. Each respondent has to complete seven forced-pair comparisons.

  5. 5.

    A round-up, where respondents can comment on the valuation tasks.

  6. 6.

    Country-specific questionnaire(s) (if any).

All sections were administered utilizing computer-assisted face-to-face interviews employing the EQ-VT platform version 2.0.

2.2.3 Preference Elicitation Methods

Time trade-off (TTO) has been widely used as a standard method to elicit preferences [25, 26]. C-TTO uses conventional TTO to elicit better-than-dead (BTD) values, and lead-time TTO to elicit worse-than-dead (WTD) values. Details regarding C-TTO can be found in the study by Oppe et al. [27]. In summary, respondents were first faced with ‘conventional’ TTO where they had to choose between 10 years in an impaired health state (Life B) and 10 years of full health (Life A). After a series of choice-based iterations, respondents achieved a point of equivalence between the length of time in full health (Life A): ‘x’ and a period of time (10 years) in the impaired health state (Life B). The impaired health state value is defined as x/10. For example, if a respondent could not differentiate between 3 years of full health in Life A and 10 years living in Life B, then that health state value would be 0.3 (3/10). For a really poor health state, respondents might prefer to die immediately; that is, the value for that specific health state is <0 (death value = 0). In this case, the lead-time TTO approach was introduced to allow respondents to express a value below the value of death; that is, below 0. The two lives in the lead-time TTO are 10 years of full health (Life A) and 10 years of full health followed by 10 years in the impaired health state (Life B). When respondents reach an indifference point between the amount of time ‘x’ in Life A and Life B, the health state value is defined as (x − 10)/10. Hence, −1 is the lowest possible value of a given health state, generated from trading the full 10 years of Life A in a lead-time TTO.

The EQ-5D-5L valuation protocol included 86 EQ-5D-5L health states to be valued using C-TTO. The 86 health states were distributed into ten blocks with a similar level of severity. Eighty unique heath states were selected using Monte Carlo simulation (eight unique heath states included in each block), five very mild states (only one dimension at level 2 and all others at level 1, e.g. ‘11112’) (each included in two blocks) and the most severe/‘pits’ state (‘55555’) (included in all blocks) [19]. Respondents were randomly assigned to one of the ten C-TTO blocks. Each state of the block was presented in random order to respondents using the EQ-VT platform.

However, it was realized that TTO has its limitations. EuroQol Group considered different valuation techniques to be used in conjunction with TTO to make the valuation studies more robust and valid. Previous experiments with DCE, like the study by Stolk et al. using EQ-5D-3L [28] or Ramos-Goñi et al. using EQ-5D-5L [29], showed that the DCE is a valid valuation technique to get health preferences from respondents. Since both TTO and DCE try to measure the same concept, it was anticipated that DCE could be used in combination with TTO [30]. In the light of this reasoning, DCE was included in the EuroQol VT protocol.

Each DCE task was conducted by presenting two health states and asking the respondent to select the preferred state for him/her. The DCE design consisted of 196 pairs of EQ-5D-5L health states distributed over 28 blocks, each consisting of seven pairs with a similar severity [19]. The seven paired comparisons were presented in random order by the EQ-VT; in addition, the right–left order of the two health states presented was also randomized.

2.3 Data Collection

At the outset, 13 interviewers were recruited and trained intensively in a 1-day workshop at two locations: (1) Jakarta for interviewers who worked in Jakarta, Bandung and Makassar; and (2) Jogjakarta for interviewers who worked in Jogjakarta, Surabaya and Medan. Each interviewer performed at least five pilot interviews in the week after training. Their experiences were discussed and feedback was given by the daily supervisor. Only after this were they permitted to conduct real data interviews. Three additional interviewers were hired during the data collection and they received similar training and met similar requirements to the first 13. Interviews were performed between March 9, 2015 and January 24, 2016. After 102 interviews we evaluated the quality of the interviews (see Sect. 2.5 below) and we concluded that their quality was not yet sufficient. Hence we retrained the interviewers and treated the 102 interviews collected thus far as pilot interviews, excluding the 102 interviews in the data analysis. A detailed description of this decision-making process and the retraining of the interviewers is provided elsewhere [30].

2.4 Exclusion Criteria

There were two main criteria for data exclusion: lack of completion of an interview and characteristics of respondents’ answers that related to poor understanding of the task or to errors [31]. Note that the first criterion concerns excluding respondents and the second excludes respondent answers/responses.

With respect to the first criterion, interviews were excluded when respondents did not finish the interview for the following reasons: (1) the respondent indicated that he/she did not want to continue the interview process, (2) interviewers concluded that the respondent was unable to differentiate between the different dimensions and levels of EQ-5D-5L, (3) interviewers concluded that the respondent was not able to comprehend the C-TTO task during the practice session. When an interview had to be stopped during the C-TTO task it was excluded from the study.

With respect to the second criterion, completed interview responses were excluded on account of any of the following characteristics: (1) a respondent had a positive slope on the regression between his/her values on C-TTO and the ‘sum score of the level digits’, as this would indicate that the respondent provided higher utility values for poorer health states on average—the slope of the regression between C-TTO and the ‘sum score of the level digits’ was generated as part of the standard quality control report; (2) when a response in the C-TTO tasks was judged to be irrational: for instance, preferring life B (10 years in the corresponding health state) to life A (10 years in full health) and not shifting after his/her initial response was reconfirmed by the interviewer; (3) responses that were marked by the respondents in the Feedback Module task, which was a sign that the respondents disagreed with the valuation of those responses.

2.5 Quality Control

To ensure data quality, we followed the quality control (QC) process described by Ramos-Goñi et al. [32], which consisted of minimum quality criteria and cyclical feedback to improve interviewers’ skills. The EuroQol Group facilitates use of the EQ-VT QC tool, which is a software programme that automates the production of QC reports based on data from EQ-VT studies. Bi-weekly meetings (teleconference-based) were organized to discuss the QC reports with the EQ-VT support team. The aim of these meetings was to evaluate and improve the interviewers’ performance and to check for possible non-compliance to the interview protocol.

2.5.1 Minimum Quality Criteria

The QC reports provided a number of statistics related to the quality of the data collected thus far, differentiated by interviewer.

  1. 1.

    Wheelchair time: when the duration of time an interviewer used to explain the ‘wheelchair example’ preceding the actual C-TTO tasks was <3 min.

  2. 2.

    Wheelchair lead-time: when the interviewer did not explain the WTD element of the wheelchair example.

  3. 3.

    C-TTO duration: if completing the ten C-TTO tasks took <5 min.

  4. 4.

    Inconsistency: the value for state ‘55555’ was not the lowest and it was at least 0.5 higher than that of the state with the lowest value

If any of the four above-mentioned signs are observed, the interview is ‘flagged’ as being of suspicious quality. If four or more of the interviews are flagged as being of poor quality, all ten interviews thus far conducted by that specific interviewer are removed and retraining of that interviewer is conducted. After a further ten interviews, the performance and compliance are re-evaluated. If again four or more interviews are flagged, the next set of ten interviews will also be removed and the interviewer is removed from the data collection process. Quality control focused on the interviewer; responses in flagged interviews were not removed from the data that was analysed.

The DCE part of the valuation study was also monitored to detect suspicious response patterns. Assuming that A is the health state at the left of the screen and B is the health state at the right of the screen, then a consistent preference for the left (A) would be suspicious (AAAAAAA). The same would apply for the response pattern BBBBBBB, ABABABA, BABABAB. This was also reported in the QC report.

2.5.2 Cyclical Feedback

The retraining programme conducted by the daily supervisor was held in 2 locations: (1) Jakarta for interviewers who worked in Jakarta, Bandung and Makassar; and (2) Jogjakarta for interviewers who worked in Jogjakarta, Surabaya and Medan. The QC reports for their interviews were presented, discussions were held to address non-compliance problems, and suitable solutions were agreed upon among the interviewers. After the retraining programme, the daily supervisor continuously created QC reports, made notes at the group and individual levels, and sent feedback to the interviewers, so that they were able to learn from their own and other interviewers’ performance.

2.6 Data Analysis

We describe the sample characteristics including self-reported health on the EQ-5D-5L descriptive system and the EQ-VAS using percentages for discrete variables and means and standard deviations for continuous variables in comparison with the Indonesian population. A general z test was used to investigate whether the proportions in the sample were similar to, or different from, the general population.

In this investigation we used TTO (specifically C-TTO) and DCE. TTO has limitations such as loss of aversion [33], but also has advantages as the TTO-based value sets are anchored on a scale of (0) death to (1) full health. DCE is not exempt from limitations, as lexicographic behaviour from respondents has been widely reported in the literature [34]. It is also noticeable that DCE, in its present form, where time is not incorporated in health state presentations, does not anchor value sets on a (0) death to (1) full health scale. Therefore, DCE produces value sets on an arbitrary scale based on the relative distances between health states.

However, both techniques attempt to measure health states preference, but using different underlying assumptions, and seem to not share the same limitations. Therefore, the data obtained from these two elicitation methods could be seen as complementary, not necessarily competing with each other. Hence, we chose the solution presented by Oppe and van Hout [35], who combined DCE with C-TTO in a ‘hybrid model’, imposing the (0) death to (1) full health scale as determined by C-TTO.

To illustrate how the hybrid model combined C-TTO and DCE responses in this study, we also present the results from the models estimated from each C-TTO and DCE separately, with the same assumptions as those used for the hybrid model. We used the 20-parameter main effects model, which estimates four parameters for the five levels of each of the five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Each coefficient represents the additional utility decrement of moving from one level to another. Hence, the overall decrement of moving from ‘no problems’ to ‘unable/extreme problems’ is calculated as the sum of the coefficients of ‘no problems to slight problems’, ‘slight problems to moderate problems’, ‘moderate problems to severe problems’, and ‘severe problems to unable/extreme problems’.

Presenting the TTO, the DCE and the hybrid model also allows us to compare the value distribution in the form of the correlations between the predicted values of the models, and we can compare the weights of the individual dimensions. This gives information about construct validity in the form of ‘convergent validity’, or ‘concordance’.

Modelling was undertaken using the STATA statistical package. C-TTO data were modelled using the response values as dependent variables and the health states as explanatory variables. This was achieved by the implementation of a Tobit model (hyreg with ll() option), which assumes a latent variable Y*it underlying the observed Yit of C-TTO values when there is either left- or right-censoring in the dependent variable. The C-TTO data, in particular the lead-time C-TTO for WTD health states, is by nature censored at −1 [ll(−1) option on hyreg command]. This means that observed preference values were valued by the C-TTO method at −1, despite the latent preferences of respondents possibly including values lower than −1 [36]. The Tobit model accounts for this censoring by estimating the latent variable Y*it, which can take on predicted preference values extrapolated beyond the range of the observed values. Variance of C-TTO data is not homogeneous among health states; this led us to model C-TTO data as heteroskedastic data. We used the hetcont() option of the hyreg command as suggested by Ramos-Goñi et al. [37]. The dummy variables included in the hetcont() option were the same as those included in the main model, that is, the 20 dummies that specified the main effects model.

DCE (forced pair comparisons in our case) responses were modelled as a conditional logistic regression model including the same 20 dummy parameters as those used for the C-TTO data. Nevertheless, we did not use the coefficients estimated from a conditional logit model because they were expressed on a latent arbitrary utility scale. We rescaled the DCE coefficients using the same parameter θ that was estimated in the hybrid model. This rescaling assumes that the C-TTO model coefficients are proportional to the DCE model coefficients. For more details on the modelling see Ramos-Goñi et al. [23, 37].

Pearson product-moment correlation analysis was applied to measure the strength and direction of association that exists between C-TTO, DCE rescaled and hybrid predicted values for 3125 health states.

3 Results

3.1 Respondent Characteristics

In total, 1056 of 1117 respondents who were approached after the retraining of the interviewers completed the interview. Reasons for interview failure were refusal to participate (36, 3.2%), conflicting schedules (14, 1.25%), discontinuation of the interview at the respondent’s request (10, 0.89%), and discontinuation of the interview by the interviewer’s decision because of the respondent’s lack of understanding (1, 0.09%). From the remaining 1056 respondents, we excluded two respondents who had a positive slope on the regression between their values on C-TTO and the sum score of the level digits of the health states, indicating that the respondent provided higher utility values for poorer health states on average, leaving 1054 respondents in the final sample. No interviewers were removed because of persistent low-quality data.

Characteristics of the respondents in the final sample were similar to those of the Indonesian population in terms of residence, gender, and religion. There were some statistically significant differences in some of the age groups, education levels, and ethnicities, but the absolute differences are small as these are <4% (Table 1).

Table 1 Characteristics of the study respondents/general population

3.2 Self-Reported Health Problems

Table 2 shows that the highest proportion of health problems was reported in the pain/discomfort dimension (39.66% reported ‘any problems’) and the lowest in the self-care dimension (1.9%). From the final sample, 464 (44.02%) reported no health problems on any dimension (‘11111’).

Table 2 Self-reported health using the EQ-5D-5L descriptive system and the EQ VAS

3.3 Data Characteristics

The 1054 respondents provided 10540 C-TTO observations (respondents valued 10 health states each). We excluded 45 observations because they were ‘irrational answers’: preferring life B (10 years in the corresponding health state, which is worse than full health) to life A (10 years in full health) and not shifting after his/her initial response was reconfirmed by the interviewer. Furthermore, 1033 observations that were pointed out by the respondents in the Feedback Module task were removed. Accordingly, the C-TTO dataset contained 9462 observations. Of these, 187 (1.97%) observations relayed the value 0, and another 3349 (35.39%) were negative values (see Fig. 1 for the histogram of the observed C-TTO values). The 86 observed mean C-TTO values ranged from −0.719 for state ‘55555’ to 0.909 for state ‘12111’. The mean observed values were negative for 29 health states out of 86 used in the C-TTO design (see Online resource 1 in the electronic supplementary material).

Fig. 1
figure 1

Observed C-TTO values. C-TTO composite time trade-off

The DCE dataset comprised 7378 observations (all respondents completed seven paired comparisons). Twenty respondents (1.89%) answered with suspicious patterns: AAAAAAA (always chose the health state at the left of the screen), BBBBBBB (always chose the health state at the right of the screen), ABABABA or BABABAB; however, their responses were not excluded from the final dataset.

3.4 Modelling Results

There were 657 (6.92%) left-censored C-TTO observations: when respondent gave the lowest possible value (−1) for a health state in the C-TTO task. The Tobit C-TTO model results were logically consistent. Conditional logistic regression was used to model the DCE responses that were also logically consistent (we used the rescaled DCE coefficients). C-TTO and rescaled DCE predicted values for 3125 health states were correlated, as Fig. 2a shows (r = 0.9881, p value <0.0001). Table 3 shows that both sets of coefficients were in relative agreement; that is, the most important dimension was mobility and the least important was pain/discomfort. The hybrid model, which utilized both C-TTO and DCE data, was also in relative agreement with both C-TTO and DCE models. Figure 2b, c show a high correlation of hybrid predicted utility with models predicted from C-TTO (r = 0.995, p < 0.0001) and rescaled DCE (r = 0.997, p < 0.0001).

Fig. 2
figure 2

a Comparison of C-TTO and DCE rescaled predicted utilities. b Comparison of C-TTO and hybrid predicted utilities. c Comparison of DCE rescaled and hybrid predicted utilities. C-TTO composite time trade-off, DCE discrete choice experiment

Table 3 Estimation results for C-TTO model, DCE rescaled model, and hybrid model

The hybrid model with main effects was logically consistent (Table 3). Using this as the final model to obtain 3125 EQ-5D-5L health states, the maximum value was 1.000 for full health (health state ‘11111’) followed by the health state ‘11112’ with value 0.921. The minimum value was −0.865 for the ‘55555’ state. Of the 3125 health states, 1108 (35.46%) had negative values using the hybrid model. The coefficients from the hybrid model were also in agreement with the previous two models regarding mobility appearing as the most important dimension and pain/discomfort as the least important.

To obtain utility for an EQ-5D-5L health state, for instance ‘12345’, the following calculation based on the hybrid model (final value set) is needed:

Utility weight (‘12345’) = 1 − no problems in MO (0) − no problems to slight problems in SC (0.101) − no problems to slight problems in UA (0.090) − slight problems to moderate problems in UA (0.066) − no problems to slight problems in PD (0.086) − slight problems to moderate problems in PD (0.009) − moderate problems to severe problems in PD (0.103) − no problems to slight problems in AD (0.079) − slight problems to moderate problems in AD (0.055) − moderate problems to severe problems in AD (0.093) − severe problems to extreme problems in AD (0.078) = 0.240.

Note that each coefficient represents the additional utility decrement of moving from one level to another.

4 Discussion

The aim of this study was to obtain social preferences and thus derive an EQ-5D-5L value set from the Indonesian general population. To obtain values for 3125 EQ-5D-5L health states, 1054 respondents were interviewed using the computer-assisted EuroQol Group valuation protocol. C-TTO and DCE were part of the protocol employed in six cities and their surrounding areas. We used an iterative quality control approach in order to obtain high-quality data. The socio-demographic characteristics of the respondents were similar to those of the Indonesian population with respect to residence, gender, age, level of education, ethnicity, and religion. This makes EQ-5D-5L suitable for health economic evaluations that will benefit the national health insurance scheme. Furthermore, non-HTA studies in Indonesia such as those using patient-reported outcome measures (PROMs), clinical trials or improvements in hospital care could use EQ-5D-5L as an instrument to measure HRQOL, with the notion that the values attached to the health states are societal values.

Several limitations of this study should be considered. It could be argued that there are still statistically significant differences in the distribution of background variables in the sample compared with the data provided by the National Bureau of Statistics. There are statistically significant differences, but these are small, and limited to some age groups, some education levels, and some ethnicity groups. As a check to see if such small differences were of importance, we compared observed C-TTO values for each health state between respondents with different levels of age, education, and ethnicity. There was no clear pattern of differences in the health state values. Moreover, as can be seen in Table 1, the percentage deviations were small and statistical significance should be seen in the light of the statistical power of more than 1000 respondents. Given these observations, and given that weighting for background variables would add additional complexity, we chose not to introduce weighting for these small deviations from full representativeness.

The strategy of finding respondents using personal networks of the interviewers and the respondents could raise questions about the objectivity/representativeness of the study sample. Yet we preferred this way of recruitment in order to find respondents who fit into the pre-determined quota groups because we judged it to be a lesser problem than insufficiently filled categories in the quota sampling. The quota groups were determined on the variables that were considered to be important in defining representativeness. In that respect, we have constructed a representative sample based on pre-determined variables: rural/urban, gender, age, level of education, religion and ethnicity. A further investigation could be conducted to find out whether recruiting respondents via personal networks of interviewers and/or respondents is not preferable or acceptable.

Indonesia has five major islands that are inhabited by 93.5% of the population [1]. However, 92.9% of respondents interviewed in this study were living on Java Island. This might raise questions about the representativeness of the study sample. However, we focused the data collection on Java island because it is the most populous island (57% of the population) and the main target of migration from all over Indonesia. The diversity of its residents in terms of ethnicity helps to fulfil all the categories in our quota sampling in a cost-effective way. We do not know whether the values obtained in Java from these migrants would have differed from the values should the interviews have been conducted on their original islands. One way to investigate whether location is indeed an issue in valuing health in Indonesia would be to sample values for health states at different places/islands in the republic. For instance, the same health states could be valued in Aceh (west), Java (middle) and Papua (east). Such a study could then be used to provide the motivation for additional studies that sample the values for people living in other parts of the archipelago. For the time being, we conclude that the present value set is the best representative values set for the EQ-5D-5L now available for Indonesia.

Several study findings are worth highlighting. First, this is the first study in Asia to have used the hybrid model to maximize information obtained from C-TTO and DCE. The models demonstrated logical consistency and significant regression coefficients. Two possible reasons that the data led to logical and significant models could be that the data were of high quality, which was assured by (1) the extensive use of the QC report provided by the EuroQol Group, and (2) the retraining programme conducted after dropping the first 102 interviews owing to their poor quality [30]. The QC report identified the first 102 interviews as problematic; indeed, further analysis using the hybrid model demonstrated that the results of these interviews showed logical inconsistencies in self-care and pain/discomfort dimensions, together with a regression coefficient that was not significant for pain/discomfort level 4 (p = 0.179). The lesson learned here is that even sophisticated models profit from high-quality data.

Second, the Indonesian results present more negative values than any other EQ-5D-5L valuation study undertaken so far (i.e. in the UK, the Netherlands, Canada, Uruguay, Japan and Korea [38,39,40,41,42,43]). It could be argued that the high number of negative values is the result of interaction between a process-related factor—quality control process and a cultural-related factor—interdependence among the members of a society (collectivism vs individualism). This study implemented the quality control process rigorously. It is possible that this quality control process provides the interviewer with better feedback and therefore better skills to administer the complex WTD trade-offs. Therefore, the more valid administration of the C-TTO means that more interviewers follow the protocol, which could have led to a higher proportion of negative values. The cultural factor, namely collectivism, might play a role. People from collectivistic cultures, such as Indonesia, are more concerned with how their illness might affect their closest circles such as family and friends [44]. Moreover, they are more reluctant to explicitly ask for help [45]. Some comments from our respondents support this: having severe or extreme/unable problems in EQ-5D dimensions was very bad for them individually, but will also be a burden for their closest circles (family and friends). For other respondents, they preferred to die than to bother their families and friends when they have a severe illness. The EQ-5D-3L value set of Singapore, a neighbouring country of Indonesia and a collectivistic country as well, showed the all-worst state ‘33333’ was −0.769 [46]. When more national valuation studies are published, it will be possible for a further investigation to disentangle the effect of these factors on proportion of WTD values in an EQ-5D-5L valuation study.

Third, we had a low level of non-response: only 61 of the 1117 respondents. Our recruitment strategy, which involved local leaders and asking respondents to recommend our study to other people, contributed to this low number.

Fourth, this study was performed according to the EuroQol Group’s EQ-5D-5L valuation protocol. Hence, the results are comparable to findings obtained in other countries. The final Indonesian value set shows that the mobility dimension influenced utility estimates the most, similar to EQ-5D-5L valuation study results from Uruguay and South Korea [41, 43]. The pain/discomfort dimension had the least influence on utility estimates, quite the opposite of the EQ-5D-5L value sets of England and the Netherlands where this dimension was in the top two most influential, after anxiety/depression [38, 39]. Perhaps this was because, in countries such as the Netherlands and the UK, problems with mobility had less influence due to better infrastructure provision and less emphasis on manual labour. It could also be argued that Indonesian people have adapted to mild levels of pain or discomfort, or perhaps they considered a mild level of pain or discomfort something they have to live with. The same line of reasoning applies to anxiety/depression. Indonesian people report more problems with anxiety/depression and have adapted to these mild levels of anxiety/ depression, or they consider this as part of normal life. It could also be a result of small differences in translation. If the translated Indonesian words for depression and anxiety refer to a lighter problem, then it makes sense that the prevalence was higher and the disutility lower. Indeed, there are some indications that this was the case. In the Indonesian EQ-5D translation, the word ‘sedih’, which might also be translated as ‘sadness’, is added to the description of the anxiety/depression dimension. These kinds of interactions between the description of the dimensions and the values attached justify attempts to utilize local and linguistically matched value sets for utility questionnaires such as EQ-5D. If not, value sets based on other languages might apply the wrong (higher) utilities to the descriptors. For instance, it is now clear that one cannot use the UK value for anxiety/depression for the Indonesian descriptor with an additional word ‘sadness’.

Several policy implications of the present study can be considered. The finding that the mobility dimension most affects utility could be implemented in Indonesian government policies, such as allocating more funds to the prevention of diabetic foot ulcers or other interventions that improve mobility like better wheelchairs. Moreover, the anxiety and depression problems reported should be addressed. If so, the discussion concerning the translation of the anxiety and depression dimension mentioned in the paragraph above should be taken into account. If indeed anxiety and depression are such common afflictions in Indonesia, mental health treatment by professionals such as psychologists and psychiatrists within the national health insurance scheme should be considered.

Indonesia is endeavouring to implement HTA comprehensively. The present research shows that in measuring and valuing quality of life, Indonesia bears comparison with the leading countries employing HTA. Evidently Indonesia still has ground to conquer when dealing with models, cost data and decision making. Nevertheless, this research shows that it is possible to arrive at an established level of HTA methodology in a short time-span when cooperating at an international level.

5 Conclusion

This paper contains the EQ-5D-5L value set for Indonesia based on Indonesian citizens’ health preferences. We expect our results to promote and facilitate health economic evaluations in Indonesia which can help to inform decision makers concerning resource allocation decisions.

Data Availability Statement

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.