FormalPara Key Points For Decision Makers

The instruments that are currently available to assess quality of life in mycosis fungoides cutaneous T-cell lymphoma (MF-CTCL) do not provide health state utility values (HSUVs).

The application of existing mapping algorithms to derive HSUVs in this patient population yielded unreliable estimates.

More research is needed to develop mapping algorithms using disease-specific instruments (e.g., MF/SS-CTCL QoL).

1 Background

Cutaneous T-cell lymphomas (CTCLs) are a group of rare subtypes of non-Hodgkin lymphomas that primarily involve the skin and account for approximately 2% of all lymphomas. Mycosis fungoides (MF) is a low-grade cutaneous lymphoma encompassing more than half of primary CTCL cases, with an incidence rate of around 5.6 per million persons and a median age at diagnosis of 55–60 years. The choice of treatment depends on the patient’s comorbidities and disease staging [1]. In MF-CTCL patients with limited/localised skin involvement, the National Comprehensive Cancer Network (NCCN) Guidelines recommend topical mechlorethamine hydrochloride (MCH, or nitrogen mustard) as a primary skin-directed treatment option [2]. However, there is currently no curative treatment for MF-CTCL, and the main treatment objective is to reach effective palliation with symptom improvement and enhance the patient’s quality of life (QoL) [1]. Indeed, patients with CTCLs experience several symptoms affecting their daily life, such as skin sensitivity, itching, annoyance about the disease, worry that it could worsen, and impairment in sexual life [3]. Therefore, the use of patient-reported outcome measures (PROMs) to measure the self-perceived health status and QoL is essential in CTCLs [3].

The only instrument measuring QoL specifically in MF or Sézary syndrome (SS) subtypes of CTCLs (MF/SS-CTCLs) is the MF/SS-CTCL QoL, for which a total score is calculated by adding up the patient’s total score from the 12 MF/SS-CTCL QoL items [4]. Other PROMs, either skin-specific, pruritus-specific, or cancer-specific, are also suitable to address CTCL symptomatology [4]. Among skin-specific questionnaires, the Dermatology Life Quality Index (DLQI) is a simple 10-item questionnaire for routine clinical use in dermatology [5]. The more recent Skindex is an instrument that studies the effects of a wide variety of skin diseases on patient’s QoL, while the original 29-item version (Skindex-29) inquiries about how often (never, rarely, sometimes, often, all the time) during the previous 4 weeks the patient experienced the effect described in each item. Seven items address the ‘symptoms domain’, 10 items address the ‘emotional domain’, and 12 items address the ‘functioning domain’. All responses are transformed to a linear scale of 100, varying from 0 (no effect) to 100 (effect experienced all the time) [6]. Skindex-29 showed high correlation with MF/SS-CTCL QoL [4]. A shorter 16-item version (Skindex-16) was developed to measure bother rather than frequency of symptoms, and to reduce respondent’s burden [6]. Among pruritus-specific questionnaires, the Visual Analogue Scale (VAS) has been considered as a valuable technique for assessing pruritus [7], in addition to the 22-item ItchyQoL [8] and the 5-D itch scale [9], which both measure QoL in patients with chronic pruritus. Lastly, European Organisation for Research and Treatment of Cancer (EORTC) questionnaires (https://qol.eortc.org/) and Functional Assessment of Cancer Therapy-General (FACT-G; https://www.facit.org/FACITOrg) can apply to patients with CTCLs to investigate cancer-specific issues.

However, none of these PROMs is provided with a preference-based algorithm converting responses into health state utility values (HSUVs) for quality-adjusted life-year (QALY) calculations. In several jurisdictions, the most common technique used to inform drug coverage and reimbursement decisions is the cost-effectiveness analysis, which generally expresses results in terms of incremental cost per QALY gained. Therefore, the lack of collection of preference-based PROMs in a clinical study might be an issue. In the UK, the National Institute for Health and Care Excellence (NICE) recommends that QALYs are used as a measure of outcome for economic evaluation, and that the EuroQol-5 Dimension (EQ-5D) is the preferred measure of health-related utility to calculate QALYs. However, the institution recognises that EQ-5D data may not always be available to manufacturers producing submissions and reports, and thus ‘mapping’ can be used to predict them from other measures of health. Mapping is defined as the development and use of an algorithm (or algorithms) to predict HSUVs through regression analyses using data from any indicator or measures of health [10, 11].

The PROVe study is a prospective, observational, US-based study conducted in patients diagnosed with MF-CTCL and treated with Valchlor®. Valchlor® gel is a new formulation of MCH (or nitrogen mustard) that has been shown to be well tolerated and effective in a clinical trial [12]. The PROVe study collected information in a ‘real-world’ clinical setting on the mana gement and outcomes of MF-CTCL patients treated with Valchlor®. In detail, 301 adult patients (≥18 years of age) actively using Valchlor® were enrolled at 41 US sites (March 2015–July 2017) and were monitored for up to 2 years [13]. Data collected included clinical, healthcare utilisation, adverse events and treatment patterns. The primary endpoint was the proportion of patients with ≥ 50% reduction from baseline in percentage of body surface area of disease. QoL was assessed as a secondary endpoint by using Pruritus-VAS (scale 0–10, where 0 indicates no pruritus and ≥ 9 indicates very severe pruritus), Skindex-29, and the newly developed MF/SS-CTCL QoL, none of which is preference-based and yields HSUVs. Thus, the aims of the current study were to derive HSUVs in MF-CTCL by applying any mapping algorithms that used one of the three PROMs adopted in the PROVe study, and to assess the feasibility of this approach by comparing mapped utilities with the HSUVs estimated in the literature for MF-CTCL patients.

2 Methods

2.1 Literature Review

We searched PubMed, the School of Health and Related Research Health Utility Database (ScHARRHUD), and the Health Economics Research Centre (HERC) database of mapping studies (version 7.0) [14, 15] to identify (1) studies mapping any of the three instruments adopted in the PROVe study onto preference-based PROMs; and (2) studies estimating HSUVs in CTCL patients. In PubMed, we used two different search strings in all fields. The former included the terms ‘mapping’ AND (‘MF/SS-CTCL QoL’ OR ‘Skindex-29’ OR ‘Pruritus VAS’), while the latter included (‘cutaneous T-cell lymphoma’ OR ‘mycosis fungoides’) AND (‘standard gamble’ OR ‘time trade-off’ OR ‘person trade-off’ OR generic preference-based PROMs denominations [16] i.e., ‘EQ-5D’, ‘HUI2/3’, ‘SF-6D’, ‘AQoL’, ‘15D’ and ‘QWB’). In using these terms, we considered that some instruments might be spelt in different ways (e.g., EQ-5D, EQ5D, EuroQol). The last date for database searching was 2 November 2021.

2.2 Application of Mapping Algorithms

Only one mapping study [17] resulted from the HERC database and the first search string used in PubMed. This was a cross-sectional survey that collected Pruritus-VAS and EQ-5D-3L in a sample (n = 268) of the general population in South Korea. EQ-5D-3L responses were converted into HSUVs by applying the Korean value set. Thereafter, three 2-level models mapping Pruritus-VAS onto EQ-5D-3L were developed and tested using in-sample cross-validation. Among these models, according to the goodness-of-fit and model simplicity, the authors preferred Model 2 using age, age squared, sex, and Pruritus‐VAS as independent variables. Based on this study, we applied Model 2 (i.e., EQ-5D-3L utility = 1.37778 – 0.00807 × Pruritus-VAS – 0.01082 × age + 0.00013 × age2 + 0.00145 × sex) to patient-level data collected in the PROVe study to derive EQ-5D utility values, and also Model 3 (i.e., EQ-5D-3L utility = 1.17954 – 0.00800 × Pruritus-VAS) using only the Pruritus-VAS as an independent variable. Conversely, we disregarded Model 1, which included some demographic variables that were not available in the PROVe study. We assumed that missing VAS data were missing at random (MAR) and used multiple imputation (MI) to handle them. We performed a linear regression model (mi impute regress) using statistically significant covariates, generated 10 multiple imputed datasets, and calculated mean imputed VAS scores [18]. We performed descriptive statistics of mapped HSUVs in the whole sample and in relevant subgroups, where mean values were compared using the t-test and one-way analysis of variance (ANOVA) [19]. Although the distributions of mapped EQ-5D utility values were not completely normal, the sample size was sufficiently large to allow the comparison of subgroups using parametric tests [20, 21]. We presented results for both multiple imputed dataset and complete-case (CC) analyses. All analyses were conducted using STATA 16 (StataCorp LLC, College Station, TX, USA).

3 Results

3.1 Study Sample

The characteristics of the 298 evaluable patients (of 301 enrolled) in the PROVe study are shown in Table 1. The mean age was 61.7 years, 60.1% were men, 68.1% were White, and stage IA was the most prevalent cancer stage (41.9%). The 298 patients provided 1441 Pruritus-VAS scores over a total of 2097 visits; missing scores (n = 656, 31.3%) were imputed using age and visit number (from 0 to 20) as predictors in linear regression (given that they resulted in significant predictors of missingness in logistic regression). The pattern of missingness over the study period is shown in Electronic Supplementary Table A1. The average VAS scores, converted from a 0–10 to a 0–100 scale to allow application of the algorithms of Park et al. were 28.73 in CC and 28.56 with MI. Mean VAS scores showed small oscillations (p = non-significant) over the study period (Fig. 1).

Table 1 Characteristics of MF-CTCL patients (n = 298) enrolled in the PROVe study
Fig. 1
figure 1

Pruritus-VAS (0–100) by study visit. ANOVA analysis of variance, min minimum, max maximum, VAS Visual Analogue Scale

3.2 Mapped EuroQol-5 Dimension (EQ-5D) Utility Values

Table 2 shows the descriptive statistics and Fig. 2 shows the related histograms of the EQ-5D utility values derived from the application of the two mapping algorithms by Park et al., for both CC and MI analyses. The average mapped utilities were equal to 0.950 (Model 3) and 0.999 (Model 2) in CC analysis, and 0.951 (Model 3) and 0.999 (Model 2) after MI.

Table 2 Summary statistics of mapped EQ-5D utility values across all visits
Fig. 2
figure 2

Distribution of mapped EQ-5D utilities across all visits. pref. preferred

Average mapped utilities were also stratified by relevant factors. No significant differences were observed by sex (Electronic Supplementary Table A2) and visit number from 0 to 20 (Electronic Supplementary Table A3). Conversely, there were significant differences (p < 0.05) across age groups, with different patterns for Model 3 and Model 2, since the latter included age among the independent variables (Electronic Supplementary Table A4), as well as across races, with White and Asian subjects presenting higher values (Electronic Supplementary Table A5), and tumour stages, with stages IA/IIA presenting higher values than more advanced stages (Electronic Supplementary Table A6).

3.3 Literature Values

The mapped HSUVs obtained from the algorithms of Park et al. were compared with the HSUVs reported by the literature in MF-CTCL patients. Five studies were retrieved from the second search string used in PubMed and were deemed suitable for this purpose (Table 3). First, a catalogue of dermatology utilities elicited using time trade-off (TTO) in direct patient interviews reported mean values of 0.867 for MF-CTCL and 0.820 for cutaneous lymphoma in general [22]. Second, a prospective, non-blinded, survey-designed study [23] that collected Skindex-16 and EQ-5D scores from patients with CTCLs reported an average EQ-5D utility of 0.83. Third, a cross-sectional survey [24] was performed in a sample of 67 MF/SS-CTCL patients who were asked to fill in the generic preference-based Health Utility Index Mark 3 (HUI3). The overall HUI3 score was 0.68 for the whole CTCL sample, 0.69 for MF, and 0.63 for SS. Moreover, patients with early-stage CTCL scored higher, on average, than those with advanced-stage disease (0.72 vs. 0.56, respectively). Fourth, a randomised phase III trial (ALCANZA [25]) used EQ-5D to examine QoL in CTCL patients randomised to receive brentuximab vedotin or physician’s choice (methotrexate/bexarotene) and obtained average utility values at baseline comprised between 0.63 and 0.78 (depending on the country value set adopted). Fifth, in a recent US-based cohort study [26], 115 MF/SS outpatients were asked to fill in the HUI3 and showed a significant reduction in health utility compared with the controls (0.64 vs. 0.78).

Table 3 Health state utility values in published mycosis fungoides cutaneous T-cell lymphoma studies

In comparison with these literature results, mapped utility from Pruritus-VAS appeared largely overestimated, since the difference between the highest estimate in the literature (0.867 in MF [22]) and the lowest mean mapped utility from this study (0.950 from Model 3) is equal to 0.083, and even larger (0.12) if referring to the highest EQ-5D estimate (0.83 in CTCLs [23]).

4 Discussion

The use of mapping is becoming popular in estimating HSUVs for cost-effectiveness analyses [10]. Overall, mapping introduces a degree of uncertainty in the estimated HSUVs and should be considered as a second-best approach compared with the direct collection of preference-based PROMs [10, 27]. However, generic PROMs yielding HSUVs are considered not sensitive enough to capture relevant changes in symptomatology over a treatment period, and disease-specific PROMs are usually preferred to measure QoL in patients recruited in clinical studies [10]. Moreover, the administration of multiple questionnaires within the same study may be too burdensome. The use of generic PROMs is particularly unlikely in studies on rare diseases, to which MF-CTCL also belongs, with an incidence of 0.59 per 100,000 [28]. Indeed, in rare diseases, the symptoms experienced by patients are usually more severe and heterogeneous than in common conditions, and EQ-5D has been shown to miss relevant patients’ concerns, such as fatigue, relationship/social life, and comorbidities [29]. In the absence of the collection of preference-based PROMs, the mapping technique has been increasingly accepted to inform reimbursement decisions of novel drugs and has recently been explored in the literature on rare diseases [30]. For example, in 2017, NICE recommended the use of carfilzomib in multiple myeloma, which is another rare cancer with an incidence of 6 per 100,000 [28], based on HSUVs derived from the application of a mapping algorithm [31] to trial EORTC data.

In this study, we used mapping algorithms to derive HSUVs for a US-based clinical study (PROVe) in MF-CTCL. The HERC database yielded only one study mapping Pruritus-VAS onto EQ-5D-3L [17]. From this study, we selected two (of three) algorithms to be applied to patient-level data collected in the PROVe study and converted Pruritus-VAS scores into EQ-5D-3L utilities. As expected, higher VAS scores (indicating worse pruritus) resulted in lower HSUVs. In subgroup analyses, we observed significant differences in average mapped utilities by age, race, and cancer stage, and no significant differences by visit number or sex. However, the applied algorithms largely overestimated HSUVs and predicted utilities above 1, and Model 2 to a larger extent than Model 3, although the former was the preferred algorithm by Park et al. [17]. In CC analysis, 51.5% and 57.2% of all mapped utility values generated were above 1, and 42.8% and 54.1% after MI, by applying Model 3 and Model 2, respectively. The average mapped EQ-5D utilities ranged between 0.950 and 0.999, depending on the algorithm applied and the imputation (or not) of missing values. Such values are considerably higher than mean HSUVs reported by previous studies that were comprised between 0.51 and 0.87, depending on the CTCL type (MF or SS) and stage (early or advanced), and likely on the technique adopted to estimate them [22,23,24,25,26]. For example, it has been shown that direct methods such as TTO or standard gamble tend to provide higher HSUVs than preference-based instruments such as the EQ-5D and HUI [32]. This phenomenon has also been observed in our review, where the only study using TTO provided the highest mean value (0.87) among the studies retrieved, although estimated from a very small sample [22].

The tendency of available algorithms to predict HSUVs above 1 has been previously reported in the mapping literature, but there is no consensus on how to deal with this issue and some studies have simply used the unadjusted mapped utility data [27, 33]. The application of algorithms developed in common diseases to their rare variants has shown even more inaccuracies due to the greater severity of the latter [30]. For example, Arnold et al. [34] showed that the available algorithms tended to overpredict HSUVs in patients with pleural mesothelioma, who are generally in poorer health compared with more common neoplasms (e.g., lung cancer) where the original algorithms were developed.

The results obtained in this study require some considerations. First, of the three PROMs collected in the PROVe study, the Pruritus-VAS might be the least suitable to be mapped onto EQ-5D, due to the mono-dimensionality of VAS compared with the other two scales (i.e., Skindex-29 and MF/SS-CTCL QoL).

Second, the Park et al. study used data and the EQ-5D-3L value set from the general population in South Korea, which may not be representative of the US population since HSUVs are likely to be affected by cultural differences among countries. In addition, the general population may have no experience of pruritus and therefore tends to under/overestimate the HSUVs of those affected by this chronic symptom [17], such as MF-CTCL patients. However, the unavailability of a specific mapping algorithm for MF-CTCL patients is not surprising, given the rarity of this condition. Lastly, the study by Park et al. did not follow any specific recommendations (e.g., the MAPS Statement [35]) for generating the algorithms and we did not perform any quality assessment of the mapping exercise.

Third, the mapping exercise was performed using data from the PROVe study, which mainly recruited patients with early-stage MF-CTCL. Therefore, the mapped utilities from this analysis could not be comparable with those obtained from other types of MF-CTCL patients, such as those diagnosed with advance stage or who progressed after initial treatment, as included in some of the studies retrieved [24, 26]. The PROVe study had a maximum follow-up period of 2 years, which limited the amount of QoL data collected from patients who had progressed, due to the slow progression of MF-CTCL.

Fourth, we observed a large proportion (31.3%) of missing Pruritus-VAS data across all visits, which limited the application of the available algorithms to a database portion. In clinical studies, missing data is often MAR, in which case MI is the preferred technique to overcome this issue. If the MAR assumption was violated, this could lead to biased results [18], but since findings from CC and MI analyses were almost overlapping, we were reassured on the robustness of the technique adopted for imputing missing data.

Lastly, since the PROVe study did not collect EQ-5D, we could not compare original and mapped utilities resulting from the same database, or calculate related differences, for example, through mean absolute error (MAE) and root mean squared error (RMSE), as recommended by existing guidelines [35].

5 Conclusions

This study derived HSUVs for patients with MF-CTCL enrolled in a clinical study and as already observed in the literature, especially in rare diseases, showed the poor applicability of mapping algorithms developed in different conditions or populations. Indeed, we obtained largely overestimated HSUVs by using the algorithms of Park et al. mapping Pruritus-VAS onto EQ-5D, if compared with the values reported in previous studies on MF-CTCLs. Therefore, the mapped HSUVs cannot be used in future cost-effectiveness analyses of treatments for MF-CTCLs.

Overall, we encourage future clinical studies to collect EQ-5D directly from patients to avoid the use of mapping algorithms for deriving HSUVs. However, in conditions where the use of preference-based PROMs is challenging, the application of mapping algorithms can represent a valuable alternative. The development of mapping algorithms using disease-specific PROMs (i.e., MF/SS-CTCL QoL) is required to increase the precision of mapping estimates in CTCLs. Moreover, studies with a longer follow-up period and recruiting more patients with advanced stages would allow to generate (or test) algorithms on a more representative MF-CTCL patient population. More research is also required to identify the most appropriate techniques to deal with the overestimation of mapped utilities.