Plain English summary

1. Why is this study needed?

The EORTC QLU-C10D is a preference-based multi-attribute utility instrument (MAUI) derived from the EORTC Quality of Life Questionnaire-Core 30 (QLQ-C30), a health-related quality-of-life (HRQL) questionnaire widely used in cancer clinical trials internationally. The QLU-C10D enables quantification of utility from responses to the QLQ-C30, and hence enables HRQL data to be used in health policy decisions about cancer. Such decisions are typically made within a country, relating to the health budget of that country, or of regional and local health authorities. Therefore, country-specific ‘value sets’ based on the values and preferences of the general population of specific countries are needed.

2. What is the key problem/issue/question this manuscript addresses?

No value sets for the EORTC QLU-C10D existed for Japan prior to this study.

3. What is the main point of your study?

The valuation survey used to develop the Japanese QLU-C10D value set followed the standard protocol developed by the Multi-Attribute Utility in Cancer (MAUCa) Consortium for evaluating the EORTC QLU-C10D in general population samples. The valuation method used was an online discrete choice experiment. The resultant value set enables HRQL data from the EORTC QLQ-C30 to be used in Japanese health policy decisions and health technology assessment.

4. What are your main results and what do they mean?

A Japanese value set for the EORTC QLU-C10D was created. Physical functioning, role functioning, and pain were associated with the largest utility weights. The value of the worst health state was -0.221, lower than that seen in most other existing QLU-C10D country-specific value sets.

Introduction

When economic evaluation of healthcare technologies is performed, quality-adjusted life years (QALYs) are standardly used for outcome measurement. QALYs can be calculated by weighting life years by the utility of the health state[1]. In Japan, since 2019, economic evaluation submissions are required for selected drug and medical device pricing before the Ministry of Health, Labour and Welfare (MHLW) can approve higher prices than for existing drugs[2]. As of December 2022, evaluations of 39 drugs and devices were completed or are in progress. The guideline for submission to the authority [3] indicates that “QALY should be used in principle” and “If Japanese quality-of-life (QOL) scores (utilities) are newly collected for a cost-effectiveness analysis, EQ-5D-5L is recommended as the first choice.” However, it does not preclude the use of alternative utility instruments.

For the purposes of economic evaluation, utility is anchored as 0 = dead and 1 = full health, which is necessary for construction of QALYs. To obtain scores on this utility scale, we typically use a preference-based measure (PBM). Many generic PBMs have been developed to measure utility, for example, the EuroQol 5 Dimensions (EQ-5D) [4, 5], Health Utilities Index (HUI)[6, 7], and Short Form 6 Dimensions (SF-6D) [8]. On the other hand, in clinical studies, disease-specific profile-type instruments are often used to measure patients’ HRQL. However, profile-type measures cannot be used for economic evaluation because they are not preference-based and therefore do not measure utility.

In collaboration with the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group, the Multi-Attribute Utility in Cancer (MAUCa) Consortium has developed the EORTC QLU-C10D, a multi-attribute utility instrument (MAUI) derived from the EORTC Quality of Life Questionnaire-Core 30 (QLQ-C30) [9, 10]. The QLQ-C30 is the most widely used cancer-specific HRQL questionnaire [11]. But because the QLQ-C30 is a profile-type measure, it cannot be used to quality-adjust survival to calculate QALYs. The QLU-C10D was developed to enable quantification of utility from responses to the QLQ-C30. While mapping algorithms are available to derive utilities from QLQ-C30 responses through generic MAUIs [12], the QLU-C10D is potentially theoretically and empirically stronger because it comprises a descriptive system and a valuation method that complies with the Checklist for Reporting Valuation Studies [13], and aims to retain the cancer-specific sensitivity which is part of the QLQ-C30. Five of the 10 QLU-C10D dimensions capture symptoms and impacts of cancer and its treatments that are not explicitly included in generic instruments: nausea, fatigue, loss of appetite, and problems with sleep and bowel function. The other five dimensions are pain and four aspects of functioning (physical, role, social, and emotional). The QLU-C10D is not a stand-alone questionnaire; it is a MAUI that comprises a health state descriptive system plus country-specific preference weighting algorithms. Online Resource 1 shows the QLU-C10D descriptive system and explains how the 10 dimensions can be derived from 13 of the 30 items in the QLQ-C30.

The MAUCa Consortium has developed a standard protocol for evaluating the EORTC QLU-C10D in general population samples, as described. Using this method, value sets have been estimated for 11 countries so far, with more in progress [14,15,16,17,18,19,20,21,22]. This study aimed to apply this valuation method in a Japanese general population sample to produce Japan-specific utility weights and value set for the QLU-C10D, and to compare the Japanese value set to those from other countries.

Methods

A cross-sectional population-based survey was designed to collect QLU-C10D valuation data from a representative sample of the Japanese general population; the study protocol was approved by the Japanese National Institute of Public Health ethics committee (approval number NIPH-IBRA #12272). The methods were consistent with previous QLU-C10D valuation studies [18, 19, 21, 23,24,25,26,27,28].

The survey was implemented by SurveyEngine, a company specialized in online choice experiments. SurveyEngine managed sample recruitment (via a Japanese online panel), survey administration, and data collection. SurveyEngine and its panel provider complied with the International Code on Market, Opinion and Social Research and Data Analytics [29]. The survey opened on 5th February 2021 and closed on 15th March 2021. Online panel members were eligible if they were aged ≥ 18 years and able to read and understand Japanese. Online panelists received an e-mail invitation to participate, including a hyperlink to the study and survey. Panel members who attempted to enter the survey via mobile phones were screened out as the discrete choice experiment (DCE) was too complex for a small screen. Consent was sought from the remainder, who were screened for quota sampling to ensure the age and sex distributions of the sample matched those of the Japanese general population (Table 1). Participants who consented and were within quota proceeded to further survey questions.

Table 1 Self-reported sociodemographic characteristics and health of the Japanese valuation survey sample (n =  2435 participants who completed 16 Discrete Choice Experiment choice sets) compared with the Japanese general population

A target sample size of ≥ 2000 respondents was determined to provide acceptable precision for model parameter estimates, based on the MAUCa consortium’s extensive experience with DCE valuation surveys and the number of health state comparisons in the QLU-C10D DCE [31,32,33,34]. This sample size was larger than most similar studies [35], and meets the various rules of thumb outlined by de Bekker-Grob et al. [36].

DCE valuation task

The feasibility of the implemented valuation task was previously established [10]. The valuation task involved choosing between pairs of hypothetical health states from the QLU-C10D; each pair formed a choice set. Online Resource 2 provides an example choice set from the Japanese survey. Each respondent was asked to consider 16 choice sets and indicate which health state they would prefer to live in until death. Each health state was described in terms of the ten dimensions of the QLU-C10D and a specified duration of survival (life years), which could take the values 1, 2, 5, or 10 years. Survival duration allowed the trade-off between QoL and life expectancy to be inferred, and enabled anchoring of utility scores at dead (zero life years) [31, 35]..

The QLU-C10D health state classification system has over a million possible health states (410 = 1,048,576). To determine which of these to include in the DCE, we constructed a designed experiment of 960 choice sets that maximized statistical efficiency of the utility model parameter estimation. The DCE contained 12 attributes: 11 attributes for the 10 QLU-C10D dimensions because two attributes were used to represent physical functioning (long and short walk); survival duration was included as the twelfth attribute to enable estimation on a health utilities scale. Twelve attributes is a relatively large number for respondents to consider simultaneously, so we simplified the cognitive task in three ways [10]: (1) we constrained the number of QLU-C10D dimensions that differed between health states in any given choice set to four; (2) we highlighted in yellow the four dimensions that differed within a choice set; (3) for the physical functioning dimension, the descriptors for levels 2 and 3 are conceptually complex, so to aid respondent comprehension, the two items (‘long walk’ and ‘short walk’) were presented separately in the survey but scored as one 4-level dimension in the DCE design. We successfully used this approach in all previous QLU-C10D valuation surveys [18, 19, 21, 23,24,25,26,27,28], confirming feasibility across 8 languages and 11 countries.

The DCE used the same designed experimental as in previous QLU-C10D valuation studies; how it was constructed has been explained previously [18, 19, 21, 23,24,25,26,27,28]. The final DCE experimental design consisted of 960 choice sets, with an estimated D-efficiency of 90.4% relative to the best design with that level of overlap. There were three levels of randomization in the DCE component of the survey: (1) each respondent was randomized to answer 16 of 960 choice sets in the DCE design; (2) which option was presented as Situation A or Situation B was randomized within each choice set to mitigate any ordering bias; (3) the order of QLU-C10D dimensions was randomized for each person to prevent any order effect, with duration always presented as the last attribute.

Other survey content

The survey included several other components in the order shown in Fig. 1. These included sociodemographic characteristics and four validated self-reported health measures: the general health question from the 36-Item Short Form Health Survey (SF-36) [39], the EORTC QLQ-C30 [40], the Kessler 6 Psychological Distress Scale [41], and a preference-based generic health status measure, the 5-level version of the EQ-5D (EQ-5D-5L) [42, 43] After completing the DCE component, participants were asked four fixed-format questions about the difficulty and clarity of the valuation task and the strategy used to choose between health states (Online Resource 3).

Fig. 1
figure 1

Respondent flow and sample sizes for each component of the valuation survey

Statistical analyses

Descriptive statistics summarized sample demographics, self-reported general health, and participant feedback on the DCE valuation task. Sample representativeness was assessed against population reference data for demographics and self-reported general health using chi-square tests and t-tests.

Analysis of the DCE data followed the MAUCa consortium’s standard approach, as described previously for other QLU-C10D country-specific value sets [18, 19, 21, 23,24,25,26, 28]. This yields utility estimates consistent with standard QALY model restrictions by using a functional form we and others have used previously [10, 33, 34, 43, 44]. The QALY model requires all health states have zero utility at dead [45]. This requirement is satisfied by Eq. 1 and 2 because they include the interaction between the QLU-C10D levels and a TIME variable representing survival duration (life years). The designed experiment allowed for all these interactions. In Eqs. 1 and 2, as TIME tends to zero, the systematic component of the utility function tends to zero. Another requirement of the QALY model is constant proportional time trade off, therefore the relationship between utility and TIME (life years) was considered to be linear.

A useful feature of this functional form is that the impact of moving away from Level 1 (no problems) in each HRQL dimension is characterized by the two-factor interaction term between the QLU-C10D levels and TIME. This enables a utility algorithm in which the effect of each level of each dimension is included as a decrement away from full health (which has a value of 1).

We analyzed the DCE data with STATA 13.0 [46] in two ways. The primary analysis used conditional logit models (Eq. 1), in which the utility of option j in choice set s for survey respondent i was assumed to be

$${U}_{isj}=\alpha {TIME}_{isj}+ \beta {X}_{isj}^{\mathrm{^{\prime}}}{TIME}_{isj}+{\varepsilon }_{isj}$$
(1)

i = 1, …, I respondents; j = situations A, B; s = 1, …, 960 choice sets.

Here, α is the utility associated with a life year,\({X}_{isj}^{\prime}\) is a vector of dummy variables representing the levels of the QLU-C10D health state presented in option j, and β is the corresponding vector of utility weights associated with each level in each dimension within \({X}_{isj}^{\prime}\), for each life year. The error term \({\varepsilon }_{ isj}\) was assumed to have a Gumbel distribution.

Because each respondent assessed up to 16 choice pairs, we allowed for intra-individual correlation, using a clustered sandwich estimator to adjust the standard errors. We estimated utility decrements for each movement away from Level 1 (no problems) in each QLU-C10D dimension by dividing each β term by α [44], and used the delta method [47] in STATA to estimate standard errors and confidence intervals for these ratios.

We estimated two versions of Eq. 1. Model 1 included every decrement from the best level (i.e., Level 1, no problems) in each dimension within \({X}_{isj}^{\prime}\); thus, \({X}_{isj}^{\prime}\) contained 30 terms (i.e., 10 dimensions x (4-1) levels within each). Model 2 imposed a restriction of monotonicity in the levels of the dimensions of the QLU-C10D health state classification system by combining non-monotonic levels and re-estimating the model. Model 2 therefore included a reduced number of estimates in β (the vector of preference weights).

We conducted unweighted and weighted analyses for all models. In weighted analyses, sampling weights controlled for non-representativeness in measured respondent characteristics using the iterative proportional fitting algorithm (i.e., raking) proposed by Deming and Stephan [48], and implemented in STATA using the ipfweight command. Variance inflation due to weighting was assessed by calculating the percentage increase in the standard errors of the unweighted versus weighted coefficients.

We compared utilities derived from the Japanese QLU-C10D algorithm with those from other countries in two ways. We randomly generated 500 QLU-C10D health states, and scored each according to five country-specific algorithms, then plotted them by country, ordered them according to the Japanese values.

The following three data quality assessment metrics were assessed. We tallied the number of respondents who chose either all As or all Bs across the choice sets, then re-estimated weighted Model 2 with their data excluded. We considered the time respondents took to complete the survey. We divided respondents into deciles based on total survey time, ran a conditional logit on the DCE data in each decile, then graphed the pseudo-R2 and the number of statistically significant coefficients for each decile, interpreting low values on either indicator as suggesting relatively low quality data.

Results

Sample characteristics

As Fig. 1 shows, 3513 respondents entered the survey, 2662 (76%) of whom were within sampling quotas, consented and completed at least one choice pair, and 2435 (69%) completed all choice pairs. The data from these 2435 participants were included in analyses to assess representativeness and estimate the Japanese value set.

The sample characteristics (n = 2435) are compared to published Japanese general population characteristics in Table 1. Study participants were representative in terms of sex, age, and paid employment. Our study team discussed the type and degree of non-representativeness of the remaining variables and agreed to include four variables in raking (weighting): household income, education, health status (EQ-5D), and mental health (Kessler 6). The three remaining demographics that were non-representative were not included in raking for the following reasons: Region—the Japanese population is generally homogeneous across regions; Work status—correlated with household income, which was included in raking; Relationship status—differences per category were small (< 3.2%).

Respondents’ perception of the DCE valuation task

Online Resource 3 details respondent perceptions of the DCE valuation task. In summary, 44% rated the health state presentation as ‘unclear’ or ‘very unclear,’ and 23% found it ‘clear’ or ‘very clear.’ Regarding the choice task, 66% found it ‘difficult’ or ‘very difficult’ to choose between pairs of health states, and only 7% found it ‘easy’ or ‘very easy.’ With regard to the strategy participants used to choose between pairs of health states, 32% focused on aspects highlighted in yellow, 26% considered most aspects, and 25% focused on just a few aspects. Of 103 participants provided additional detail on their strategy, length of survival time was considered by 65/103 (51%) when choosing between health states. Acceptability of burden to themselves was cited by 22/103 (17%), and burden to others was cited by 18/103 (14%). Specific symptoms (pain, appetite, sleep) were cited by a small number of respondents (8, 4, and 4, respectively).

Data quality

Online Resource 4 details the data quality findings. When data from the 73 respondents who gave either all As or all Bs across their completed choice sets was excluded, there was little difference (max absolute difference of 0.0042) and no evidence of bias (mean difference of − 0.00054) in coefficient estimates. Median survey completion was 12.5 min, minimum 3.75 min, and maximum 69.33 min. Respondents in all completion time deciles sped up as they became more familiar with the choice task (Figure A). The fastest completion time decile yielded the least statistically significant coefficients (6/31) and the slowest two deciles yielded the most (26/31 and 25/31, respectively) (Figure B). While this suggested slower respondents produced less random data, the pseudo-R2 values were similar across deciles.

DCE Results

Conditional logit results for the 2435 respondents who completed all 16 choice pairs are presented in Table 2. In the unweighted Model 1 analysis, all coefficients are negative and increase in absolute terms in progressively higher levels. Dimensions with the largest impact (based on the largest absolute coefficient) are physical functioning, pain, and role functioning. When responses were weighted, some small non-monotonicities were observed in the trouble sleeping dimension. The effect of combining levels to prevent this (Model 2) was small. Figure 2 shows the impact of enforcing monotonic ordering on the coefficients (Panel A) and using weights (Panel B). Both figures report a line of best fit between models with and without these adjustments, as well as a 45 degree line reporting equality. All data points are close to the 45 degree line, illustrating minimal impact of these adjustments, and thus the preference weights are robust to them. The standard errors of Model 1 coefficients in weighted analyses were on average 48% larger (minimum 27%, median 46%, maximum 77%). The combined effect of weighting on coefficient estimates and variance inflation reduced the level of statistical significance of three coefficients from 5% to non-significant (Social L2, Emotional L2, Pain L2), one from 1% to non-significant (Emotional L3), and three from 0.1% to 5% (Role level 2, Trouble Sleeping level 3, Nausea level 2). In one case, it increased statistical significance (Bowel problems from not significant to 5% to 1%).

Table 2 Conditional logit results for Model 1 (unconstrained) and Model 2 (monotonicity imposed), unweighted and weighted analyses (estimated coefficients and robust standard errors (SE)), based on data from respondents who completed all 16 choice pairs in the discrete choice experiment (n = 2435)
Fig. 2
figure 2

Impact of imposing monotonicity (ordering) and weighting: scatterplots of preference weights from conditional logit models (n = 2435)

As a further robustness check, the same analyses were run including all DCE data (n = 2,662 respondents who completed at least one choice set) and the subset who completed all 16 choice pairs and all subsequent demographics (n = 2,312). As shown in Online Resources 5 and 6, results for these subsets were very similar to those in Table 2 (n = 2435), i.e., the same to two decimal places in all cases and to three decimal places in most cases.

We calculated the QLU-C10D preference weights from the unweighted Model 1 results because they were fully monotonic, and because weighting did not change the coefficient estimates much but did increase standard errors considerably. These are plotted in Fig. 3 and tabulated under the graph. As these are derived by dividing through the coefficients in Table 2 by the duration coefficient, the pattern is unchanged, with physical functioning, pain, and role functioning the largest drivers of preference. We recommend these preference weights be used in the Japanese QLU-C10D scoring algorithm, provided in Online Resource 7, including syntax for STATA and SPSS.

Fig. 3
figure 3

Japanese QLU-C10D preference weights for each dimension and level (Model 1 conditional logit, unweighted)

Japanese value set compared with other countries

The first comparison was based on four health states representing a range of health from very good to worst possible, with utility scores based on 12 country-specific utility algorithms (Fig. 4). For the best of these health states (with just a little physical functioning impairment and pain, 2111121111), the Japanese utility score ranked 8th of 12. For the health state with a little impairment in all domains (2222222222), the Japanese utility score ranked 11th, and for the health state with quite a bit of impairment in all domains (3333333333), the Japanese utility score was the lowest (rank 12/12). For the worst possible health state (very much impairment in all domains, 4444444444), the Japanese score (− 0.221) was ranked 11th, with only France having a lower value (− 0.44).

Fig. 4
figure 4

Comparison of Japanese utility scores for 4 health states with those using scoring algorithms from 11 other countries

The second comparison, based on 500 randomly generated health states, compared the Japanese value set with two English-speaking countries (the United Kingdom (UK) and the United States (US) and two European countries (Spain and France) (Fig. 5). Across these health states, the Japanese values tend to lie above those from France, but below those from Spain, the UK, and the US. This suggests Japanese respondents were generally more willing to give up life expectancy for improved health than respondents in the latter three countries, but less likely than the French respondents. However, the pronounced oscillations in the lines for the four other countries indicate further complexity in the between-country story, due to variations between countries in dimension-specific preference weights.

Fig. 5
figure 5

QLU-C10D health state values for Japan and five other countries

Discussion

This study provides the Japanese value set for the EORTC QLU-C10D, endorsed by the EORTC Quality of Life Group. The largest utility weights were associated with decrements in physical functioning, role functioning, and pain. Intermediate utility weights were associated with decrements in social functioning and nausea, while the remaining symptoms and emotional functioning were associated with smaller utility decrements. Compared with the QLU-C10D value sets from other countries, the Japanese decrements in social functioning, fatigue, and appetite were the largest of the 12 countries where QLU-C10D value sets have been established. The level 4 decrements in role function and nausea were the second largest among these countries. Generally, the Japanese weights of symptom-related items were larger than the average of the 12 countries, except for pain, where it was among the smaller. In addition, the comparison based on 500 randomly generated health states revealed considerable heterogeneity among countries, reflecting variations in dimension-specific preference weights in country-specific value sets. Different dimensions may play different roles in different cultures, but the observed variations may also be due in part to linguistic non-equivalence between countries; irrespective, these justify the need for country-specific value sets.

The value of the worst health state was -0.221, which was lower than that seen in most other existing QLU-C10D country-specific value sets, excluding France. This is a surprising result compared to the Japanese EQ-5D-5L value set. The worst Japanese EQ-5D-5L index [55555] was − 0.025, [49] which is the highest value in the world. This may be due to a key difference in the valuation methodologies used to generate the two value sets; the EORTC QLU-C10D was created by DCE with the duration method while the EQ-5D-5L was performed by composite time trade-off (cTTO) by the in-person interview. According to the Japanese EQ-5D-5L value set, the Japanese are reluctant to trade health states with death, suggesting a strong risk-aversion to death. By contrast, the international comparison of the EORTC QLU-C10D value set suggests that the Japanese willingly trade life-years with their health state. It may be caused by reflection of the Japanese preference; good health states are more preferable to long life years, but death is less acceptable, compared with Western people. Of course, it is possible that methodological artifacts may have contributed to the inconsistency. One of these is the method of preference elicitation: the QLU-C10D DCE was conducted as a self-complete survey while the EQ-5D-5L was interviewer administered. Another is translation effects in the source instrument (QLQ-C30 and EQ-5D-5L) and/or the preference elicitation DCE questionnaire for the QLU-C10D versus the cTTO for the EQ-5D-5L that somehow differentially distorted the Japanese preference task of the QLU-C10D relative to that of the EQ-5D-5L.

EQ-5D-5L is now a standard instrument for utility assessment in Japan. However, in clinical trial settings, the collection of EQ-5D-5L is sometimes omitted, and some studies include only disease-specific HRQL instruments. The QLQ-C30 and FACT-G are frequently used, particularly in cancer contexts [50]. Before the QLU-C10D was created, data obtained using the QLQ-C30 could not be used directly to calculate the QALY. Therefore, when QLQ-C30 data were used for cost-effectiveness analysis, mapping from the QLQ-C30 to PBMs was sometimes used. Although a mapping algorithm from the QLQ-C30 to the EQ-5D-5L has been established in Japan [51] mapping is not necessarily recommended for estimating utility, as there is considerable uncertainty around such calculations particularly at the extremes of the utility scale. In the Japanese HTA guidelines, mapping is only allowed if utility data cannot be obtained by other methods. As the QLU-C10D is now an established PBM, its use is more acceptable than that of the mapping algorithm. Also, because the EQ-5D-5L is a generic PBM, the QLQ-C30 is likely to be more sensitive to changes and differences in the health states of patients with cancer, and may therefore capture the utility of patients with cancer more appropriately and calculate the cost per QALY more precisely. Finally, many existing clinical trials have collected QLQ-C30 data in Japan. Such accumulated data can now be converted to utilities using the scoring algorithm generated by our research, and thereby provide HRQL weighting for QALY calculation. Given these advantages the Japanese value set for the EORTC QLU-C10D has many benefits for academics and the Japanese HTA system.

This study has several strengths. The Japanese value set was established in a large-sample representative of age and sex, enhancing generalisability of results. Second, only one inconsistency was observed in all the weights (the second and the third level of the “Trouble Sleeping” dimension). This is the lowest number of inconsistencies yet for QLU-C10D valuation studies, suggesting the Japanese survey was of high quality. Moreover, as our survey was based on the standard international protocol of the MAUCa Consortium, it facilitated international comparison with other country-specific QLU-C10D value sets. This study also has some limitations. Some respondents may not have engaged as fully in the online choice task as in face-to-face surveys. Also, respondents were not selected by random sampling from the entire Japanese population but by quota sampling from an online panel. Some characteristics of the respondents were statistically different from the Japanese population norms (e.g., region and work); we adjusted for most of these using raking, a form of sample weighting that allows weighting by several variables simultaneously. Further, the survey was conducted during the COVID-19 pandemic which had substantial impact on life in Japan. Other authors have noted that the pandemic did not impact on the ability to conduct online surveys such as this one during the pandemic, and indeed greater use and development of online research occurred during the pandemic [52]. However, it is unknown whether there was an impact on health preferences during the pandemic for the health attributes assessed in our study in comparison to pre- and post-pandemic preferences. There is limited evidence about the impacts of the COVID-19 pandemic on how people value health, and the policy implications of any such effects are unclear [53].

Conclusion

This study employed data from approximately 2,500 Japanese general population respondents who completed a DCE task based on an international valuation protocol developed for the EORTC QLU-C10D by the MAUCa Consortium and the EORTC Quality of Life Group. This produced the EORTC-endorsed Japanese value set for the EORTC QLU-C10D, which has some distinguishing characteristics compared to existing country-specific QLU-C10D value sets. Fundamentally, this study promotes economic evaluations in Japan and the development of HTA systems that produce transparent, consistent and defensible decisions around health and healthcare.