FormalPara Key Points for Decision Makers

Cost-effectiveness analysis was recently included as part of the decision-making processes to adjust prices of approved drugs and devices in Japan. The comparative cost-effectiveness of vedolizumab with other branded biologics is unknown in Japan.

As a first-line biologic for patients with moderate-to-severe active ulcerative colitis (UC), VDZ dominated golimumab and was cost-effective compared with adalimumab and infliximab.

The network meta-analysis (NMA) presented addresses the limitations of previous NMAs for treatments of moderate-to-severe active UC. The proposed economic model approach allows to directly use the evidence reported from randomized clinical trials (RCTs) and the comparative efficacy outcomes that can be obtained via an NMA for the maintenance phase, eliminating the need for individual patient data from RCTs for the comparators of interest and/or assumptions on the comparative efficacy of treatments for transition during the maintenance phase that are not reported in RCTs and that cannot be included in an NMA.

1 Introduction

Ulcerative colitis (UC) is a chronic inflammatory bowel disease that affects the gastrointestinal tract [1]. Patients with UC suffer from ulcers that generate pus and mucous, which causes inflammation and sores in the lining of the digestive tract [1]. UC is associated with significant morbidity and mortality [2]; in addition, individuals diagnosed with UC have poorer health-related quality-of-life and greater healthcare resource utilization and work productivity loss, compared with individuals without UC [3]. In Japan, the prevalence and incidence of UC have rapidly increased in recent years [4, 5]. The most recent data from the Japanese Inflammatory Bowel Disease registry reports a prevalence of 121.9 patients with UC per 100,000 persons in 2013 [6]. Data on the incidence of UC in Japan show a significant increase from 0.08 to 1.95 per 100,000 persons between 1960 and 1991 [7, 8].

There is no curative medical treatment for UC. The aim of current medical treatment is to induce and maintain remission; monitor, prevent, and manage complications; achieve mucosal healing; and improve quality of life [9, 10]. The appropriate medical treatment depends on the activity, severity and extent of disease [11]. For patients with moderate-to-severe UC who have not sufficiently responded to conventional therapy (CT; including 5-aminosalicylic acids [5-ASAs], immunomodulators, and steroids), Japanese clinical guidelines for UC recommend biologic tumor necrosis factor alpha antagonists (anti-TNF) infliximab (IFX), adalimumab (ADA), and golimumab (GOL); vedolizumab (VDZ), another biologic, which is a selective antagonist that binds exclusively to the α4β7 integrin heterodimer and is engineered to target lymphocyte trafficking localized in the gut; and tofacitinib (TOF), a Janus kinase inhibitor [11, 12]. If response or remission is achieved, long-term administration of the same therapy should be used for response and remission maintenance and to reduce the likelihood of colectomy [11, 12]. For patients who relapse after achieving response or remission, remission induction therapy should be administered; however, the guidelines do not provide specific recommendations for remission induction after a relapse has occurred while on a biologic (i.e., whether CT could be tried again or if another biologic should be used) [11, 12]. Surgery is recommended only for patients in whom adequate medical treatment is ineffective, as well as for patients suffering from colonic perforation, massive bleeding, toxic megacolon, and high-grade dysplasia, or those with severe disease which does not respond to medical treatments [11, 12].

The results of cost-effectiveness analyses are used to determine coverage by the publicly funded healthcare system in countries such as England, Canada, and Australia [13]. In Japan, the price and reimbursement for a new drug are discussed at the Central Social Insurance Medical Council (Chuikyo) and approved by the Ministry of Health, Labor and Welfare (MHLW). After approval, the price is listed as the price for reimbursement. List prices are currently revised every 2 years; starting in 2020, they will be revised every year [13, 14]. In April 2019, the Chuikyo started to use the results of cost-effectiveness analysis to adjust prices of approved drugs and devices in Japan: if the results of the cost-effectiveness analysis show an incremental cost-effectiveness ratio (ICER) below ¥5 million per quality-adjusted life-year (QALY) gained, the price is not adjusted; otherwise a price reduction occurs [13]. With this approach, the Chuikyo seeks to avoid delays in the approval and reimbursement of new drugs and to not limit patient’s access to new drugs [13].

IFX, ADA, and GOL were approved by the MHLW in 2010, 2013, and 2017, respectively, and VDZ and TOF in 2018 for the treatment of patients with moderate-to-severe UC who have not sufficiently responded to CT [14]. The efficacy and safety of VDZ for the treatment of patients with moderate-to-severe active UC have been demonstrated in an international study, GEMINI I, and a Japanese phase III VDZ study [15, 16]. Previous studies have assessed the long-term clinical and economic consequences and cost-effectiveness of VDZ compared with other biologics for the treatment of patients with moderate-to-severe UC in the context of the United Kingdom (UK), Spain, and the United States (US) [17,18,19,20]. However, this information is still unknown in Japan and can be useful for decision makers at the time of repricing biologics for the treatment of patients with moderate-to-severe UC who have not sufficiently responded to CT. In this study, we assessed the cost-effectiveness of VDZ compared with other branded biologics (ADA, GOL, and IFX) for the treatment of patients with moderate-to-severe active UC who have not sufficiently responded to CT from the Japanese public healthcare payer perspective.

2 Methods

2.1 Model Structure

An economic model using a hybrid decision tree and Markov approach was developed for this analysis (Fig. 1).

Fig. 1
figure 1

Model structure. AE adverse event, CT conventional therapy, UC ulcerative colitis. *Reduction in complete Mayo score by ≥ 3 points and by ≥ 30% from baseline with an accompanying decrease in the sub-score of rectal bleeding by ≥ 1 point from baseline or an absolute rectal bleeding sub-score ≤ 1. Response includes patients who achieve remission and response-only. Complete Mayo score ≤ 2 and ≤ 1 in all sub-scores. Response without remission. Patients treated with a biologic can discontinue treatment at any time due to AEs and switch to CT. Patients treated with a biologic can also switch to induction CT if they lose their remission or response status during the maintenance phase

Treatment for patients with moderate-to-severe active UC with a biologic (i.e., ADA, GOL, IFX, VDZ) begins in the treatment induction phase (decision tree; Fig. 1). At the end of the induction phase, patients can achieve remission or response-only (i.e., response without remission). Patients who achieve remission or response-only and do not discontinue treatment due to treatment-related adverse events (AEs) remain on treatment and enter the maintenance phase (Markov model; Fig. 1) in the corresponding health state. During the maintenance phase, patients transition between the remission and response-only health states, unless they worsen and lose their remission or response-only status. Patients can discontinue treatment during the maintenance phase due to AEs at any time. Patients who discontinue biologic treatment due to AEs (during the induction or maintenance phase) switch to induction CT. Patients who do not achieve remission or response-only at the end of induction or who lose their remission or response-only status during the maintenance phase discontinue biologic treatment and can switch to induction CT or can undergo surgery.

Patients initiating CT induction after biologic discontinuation follow a similar pathway to patients starting on a biologic. However, it is assumed that patients on CT do not discontinue it due to AEs, and that patients who do not achieve response at the end of induction undergo surgery or are assumed to remain in active disease until they may, eventually, undergo surgery.

Patients who undergo surgery are assumed to discontinue treatment and enter the surgery section of the Markov model. These patients can achieve remission after surgery without complications or experience post-surgery complications. Once post-surgery complications are managed, patients transition to the post-surgery remission health state. However, patients in post-surgery remission are always at risk of experiencing complications.

Patients may die at any time (i.e., may transition from any health state to the death health state in any model cycle).

2.2 Model Parameters and Data Sources

2.2.1 Patient Characteristics

The target population of this analysis was adult patients with moderate-to-severe active UC, defined as patients with a complete Mayo score [21] ≥ 6, who had not had an adequate response with, had lost response to, or were intolerant to a CT and, therefore, switched to biologic treatment (i.e., anti-TNF-naïve patients). The population was 61.6% male, with a mean age of 42.8 years (range 16–77), as observed in the Japanese phase III VDZ study [16, 22].

2.2.2 Treatment Comparators

For cost-effectiveness analysis in Japan, the comparators should be principally selected among treatments that are expected to be replaced by the comparator of interest (i.e., VDZ in this analysis) at the time when it was introduced to treat the target population [23]. For this reason, the comparators of interest in this analysis included other branded biologics approved for the treatment of moderate-to-severe active UC patients who have not sufficiently responded to CT in Japan: ADA, GOL, and IFX. The following treatments and dosages were considered in analysis:

  • VDZ 300 mg at weeks 0, 2, and 6, and every 8 weeks thereafter

  • ADA 160 mg at week 0, 80 mg at week 2, and 40 mg every 2 weeks thereafter

  • GOL 200 mg at week 0, 100 mg at week 2, and 100 mg every 4 weeks thereafter

  • IFX 5 mg/kg weeks 0, 2, and 6, and every 8 weeks thereafter

CT was not included as a comparator as it is not indicated for patients who have not sufficiently responded to CT (including 5-ASAs, immunomodulators, and steroids), and Japanese guidelines recommend the initiation of biologic treatment for these patients [11, 12]. During the systematic literature review (SLR) (see Appendix A in the electronic supplementary material [ESM], Online Resource 1), only induction studies were found for TOF in anti-TNF-naïve patients with moderate-to-severe UC. Maintenance studies identified for TOF only reported information for a mixed population of both anti-TNF-naïve and anti-TNF-experienced patients, but not for anti-TNF-naïve patients only, which limited the inclusion of TOF as a comparator in the network meta-analysis (NMA) and the model. Surgery was not included as a comparator as it is not recommended in Japanese clinical guidelines for the treatment of patients with moderate-to-severe UC, unless treatment is ineffective [11, 12]. IFX biosimilar was not included as a comparator as it was not considered among the treatments that are expected to be replaced when VDZ was introduced, because of the very low penetration of IFX biosimila in Japan at the time this analysis started; analyses of the Japanese claims database constructed by JMDC Inc. showed that the percentage of vials of IFX biosimilar prescribed to Japanese patients with UC among the total prescribed vials of IFX branded and biosimilar were 0.8%, 1.3%, 4.6%, and 7.1% in 2015, 2016, 2017, and 2018, respectively [24]. However, a scenario using the list price of IFX biosimilar was considered and is discussed in the “Limitations” section.

2.2.3 Clinical Efficacy

Clinical efficacy at the end of the induction phase was measured in terms of the percentage of patients who achieve response (including response-only and remission) and remission. The duration of the induction phase was set to 8 weeks as it was the average duration of the time points of efficacy assessment in the induction randomized controlled trials (RCTs) identified during the SLR (Table 4 and Table 5 in Appendix A in the ESM, Online Resource 1).

Clinical efficacy at the end of the maintenance phase was measured in terms of the percentage of patients who remain in remission or response-only (durable response) and in terms of the percentage of patients who remain in or achieve remission at the end of the maintenance phase. The cycle length of the maintenance phase was set to 1 year as the assessment of durable response and remission for the maintenance phase occurred approximately 1 year after the end of the induction phase in the Japanese phase III VDZ study [16].

An SLR was conducted to identify relevant RCTs that assessed the efficacy and safety of ADA, GOL, IFX, TOF and VDZ for the treatment of patients with moderate-to-severe UC who were anti-TNF-naïve. Studies including anti-TNF-treatment experienced patients, or a mix of anti-TNF-experienced and anti-TNF-naïve patients were also included if the study reported outcomes separately for the anti-TNF-naïve patients. Thirty-four full-text publications and 11 records identified from gray literature sources and SLR citation lists were eligible for inclusion. These 45 publications collectively reported on 18 trials; 16 trials reported the outcomes of interest for the induction phase (response [including response-only and remission] and remission), and ten for the maintenance phase (durable response and remission). To obtain the relative treatment effects for the biologics of interest versus placebo, Bayesian NMA models were used. All analyses were run in OpenBUGS 3.2.3, using code as referenced in the National Institute for Health and Care Excellence (NICE) Decision Support Unit’s series of Technical Support Documents (TSD) on evidence synthesis, TSD2 and TSD5 [25]. The four outcomes of interest were binary. All NMAs involved a 50,000 “run-in” iteration phase and a 50,000-iteration phase for parameter estimation. The posterior mean residual deviance was used to assess the goodness-of-fit of the model, while the deviance information criterion (DIC) was used to assess the comparative fit of fixed and random-effects models.

Unlike newer RCTs (e.g., GEMINI I [15], PURSUIT-M [26], the Japanese phase III VDZ study [16]), which featured re-randomization of responders (including remission and response-only) at the end of the induction phase, older trials (e.g., ACT 1 [27], ULTRA 1 [28]) used a treat-through design, in which all patients received the same treatment for 52 weeks during induction and maintenance. Attempts to “convert” re-randomization to treat-through data [29,30,31], or vice-versa [32], have resulted in a mix of assumptions, calculations, and imputations, not all of which are clearly stated in the papers or examined for risk of bias. We developed a novel method for imputation of treat-through to re-randomization that offers several advantages over prior approaches, avoiding biased estimates against VDZ’s comparators and to better mimic current approaches to treatment wherein induction responders remain on therapy. Additional details on the SLR and NMA, including our novel imputation method, are provided in Appendix A in the ESM, Online Resource 1.

The comparative efficacy for each biologic was measured using relative risks (RRs) versus placebo for each efficacy outcome (Table 1). The placebo arm of the Japanese phase III VDZ study was used as the reference to calculate RRs for each biologic versus placebo, based on the results of the NMA, which are presented as odds ratios (ORs) [16]. Appendix B in the ESM, Online Resource 1, shows how the RRs were calculated; the fixed effects models were used for the four outcomes of interest as they showed a lower DIC and residual deviance than the random-effects models and because of the sparsity of the data in the networks.

Table 1 Treatment efficacy, safety, and discontinuation

It was assumed that the treatment effect was constant if patients remained on treatment and that transition probabilities beyond the first year of treatment during the maintenance phase were the same as those estimated for the first year.

For patients initiating CT induction after biologic discontinuation, the placebo arm of the Japanese phase III VDZ study for patients who were anti-TNF-experienced was used: the probability of response (including response-only and remission) and remission at the end of induction was 0.293 and 0.098, respectively; the probability of durable response and remission at the end of maintenance was 0.357 and 0.214, respectively [16].

2.2.4 Treatment Discontinuation and Adverse Events

Patients receiving biologic therapy may discontinue treatment due to AEs. The risk of treatment discontinuation due to AEs during the induction and maintenance phase was derived from the RCTs of each biologic [15, 26,27,28, 33,34,35,36,37,38]. The incidence rate of treatment discontinuation due to AEs was calculated as the proportion of patients who discontinued treatment due to AEs over the study duration, scaling the study duration to 8 weeks for the induction phase and to 1 year for the maintenance phase. If more than one RCT was used for a specific treatment, a weighted average of the incidence rate of treatment discontinuation due to AEs, the study duration, and the number of patients in each study was calculated. It was assumed that the rate of treatment discontinuation was constant over time, and that beyond the first year of treatment during the maintenance phase, it was the same as for the first year. The rates of treatment discontinuation due to AEs were transformed into probabilities [39] and are shown in Table 1.

Japanese clinical practice guidelines recommend that if response or remission is achieved, long-term administration of the same therapy should be used for response and remission maintenance and to reduce the likelihood of colectomy [11, 12]. Therefore, no maximum treatment duration was considered for biologics in the model, and patients received treatment for as long as they benefited from it. It was assumed that patients on CT after discontinuing biologic treatment did not discontinue CT due to AEs.

AEs included in the model were erythema, infusion-associated reaction, injection site reaction, rash, and skin and subcutaneous disorder (Table 1). These AEs were the most commonly reported (≥ 10% incidence) in any arm of the RCTs of each biologic [15, 22, 26, 27, 33,34,35,36,37, 40]. It was assumed that the incidence of AEs was constant over time and that beyond the first year of treatment during the maintenance phase it was the same as for the first year.

2.2.5 Surgery

The annual rate of surgery was calculated as 1.10% during the maintenance phase based on publicly available information: the estimated number of people with UC in Japan (n = 166,060) [41]; the percentage of patients with moderate-to-severe UC in Japan (30%) [42]; and the number of annual UC-related surgeries (n = 305) conducted in Japan in the same year in 16 hospitals, out of 390 UC-related hospitalizations [43], scaled to the total number of hospitalizations for UC-related surgeries (n = 702): \(1.1\% = \left[ {702 \times \left( {305/390} \right)} \right]/\left( {166,060 \times 30\% } \right).\) This estimate was consistent with the estimation of 1% from a Japanese clinician (Dr. Ryuichi Iwakiri [RI]).

The annual risk of developing surgical complications was based on an analysis conducted by Arai and colleagues (2005), who reported the incidence complications within the first year of surgery to be 46.3% [44].

2.2.6 Mortality

There is no evidence that patients with UC have a lower life expectancy than the general population [45]. Therefore, and consistent with previous cost-effectiveness analyses of biologics in UC [17,18,19, 46, 47], the risk of death for patients with UC was the same as the all-cause risk of death of the Japanese general population, derived from age- and sex-specific 2016 life tables published by the MHLW in Japan [48], and did not vary between health states in the model.

It was assumed that no fatalities occurred due to surgery or post-surgery complications.

2.2.7 Utilities

Health state utilities were not available from patients with UC in Japan at the time of this analysis and were obtained from previous studies in patients from the UK (Table 2) [49, 50]. Utility decrements associated with AEs were obtained from a previous cost-effectiveness analysis of VDZ for the treatment of patients with moderate-to-severe UC in the UK (Table 1), assuming the duration of all AEs was 8 weeks during induction and maintenance [18, 51].

Table 2 Costs and health-state utilities

2.2.8 Costs

Direct costs included treatment-related costs (i.e., drug acquisition, administration, and concomitant medications), cost of surgery and surgical complications, disease management costs by health state, and AE management costs. All costs were obtained in 2018 Japanese yen (¥) and are shown in Table 2.

The cost of biologics and concomitant medications were obtained from Japan’s National Health Insurance (NHI) drug list for 2018 [52]. Sharing of vials to minimize vial wastage was not considered for biologics. The list of concomitant medications was health-state specific, based on the concomitant medications reported in the Japanese phase III VDZ study [22]. The frequency of resource use associated with treatment administration and health-state–specific UC management costs were estimated based on analyses of the JMDC database [24], along with consultation with a Japanese clinician (Dr. RI, Appendix C in the ESM, Online Resource 1; Dr. RI was asked to look at the resource use estimates and to validate if they were consistent with his expectations, based on his experience in UC). Unit costs were derived from Japan’s NHI medical fee table 2018 [53].

The cost of surgery was based on the mean expenditure per hospitalization observed for the total colectomy anal anastomosis procedure in patients with UC in the JMDC database. The costs of post-surgery complications included a 28-day medical management cost of pouchitis (Appendix C in the ESM, Online Resource 1). The costs of a surgical procedure and complications were accrued as a one-time cost.

For the costs for AE management, only minimal treatment such as the administration of saline solution and antibiotics were considered. The cost of saline solution and antibiotics was obtained from the NHI drug list 2018 [52].

2.2.9 Model Verification and Validation

The face validity of the model was established by presenting the model concept and structure, comparators, and assumptions to two health economics experts (Senior Research Scientists within Evidence Synthesis, Modeling and Communication at Evidera, a company which provides consulting and other research services to pharmaceutical, medical device, and other organizations, who were external to the team of authors of this study and who became familiar with the disease area) and one Japanese clinician (Dr. RI), to ensure that the model was designed correctly and rigorously from the modeling standpoint and the perspective of clinical face validity. Extreme-value sensitivity analyses were conducted to determine the technical accuracy of the model and for logical consistency. Upon completion of the model, a comprehensive and rigorous quality check was performed validating its logical structure, mathematical formulas and sequences of calculations, as well as the model inputs. Predictive validity was checked by comparing key outcomes, including the number of patients who achieved response and remission at the end of the induction phase and who achieved durable response and remission at the end of the first year of maintenance, to the source data.

2.2.10 Analyses

The life-table method for half-cycle correction was used to calculate all model outcomes [54]. As recommended by the guidelines for economic evaluation of drugs and medical devices in Japan, this analysis took the public healthcare payer’s perspective, the time horizon of analysis was lifetime, and the annual discount rate for costs and health benefits was 2% [23]. A treatment was considered cost-effective at ICERs below a conservative willingness-to-pay (WTP) threshold of ¥5,000,000 per QALY gained [13, 55].

Deterministic sensitivity analyses (DSAs) were conducted to assess the impact of all model parameters in the ICERs. When possible, 95% confidence intervals (CIs) were used to vary the parameters. Otherwise, parameters were varied by ± 20%.

To account for statistical uncertainties in the model parameters, probabilistic sensitivity analyses (PSAs) were conducted by varying simultaneously the model parameters using the probability distributions and parameters presented in Appendix D in the ESM, Online Resource 1. Drug acquisition and administration costs were not included in the PSA as they were not subject to parameter uncertainty. The model was carried out running 5000 simulations by randomly drawing values of input parameters from their respective distributions and calculating the ICERs for each model run.

3 Results

3.1 Base-Case Analysis

As shown in Table 3, over a lifetime, treatment with VDZ was associated with more time spent on biologic treatment, response and remission, and fewer surgeries compared with the other biologics. Compared with GOL, VDZ yielded greater QALYs at a lower total cost and was dominant. Compared with ADA and IFX, VDZ yielded greater QALYs at a higher cost, with ICERs below the WTP threshold of ¥5,000,000. Therefore, VDZ was cost-effective compared with ADA and IFX.

Table 3 Base-case outcomes over lifetime

3.2 Deterministic Sensitivity Analyses

The DSA showed that VDZ was dominant or cost-effective versus GOL in all scenarios. Figures 2 and 3 present the results from the sensitivity analyses versus ADA and IFX, respectively, for the top 10 parameters that had the largest effect on the ICERs, in the order of their respective influence. It should be noted that, unlike in other countries, the list price of a drug in Japan is the price for reimbursement covered by the NHI, and this price is revised every 2 years [13, 56]. Therefore, variations in the cost per pack of VDZ, ADA, GOL, and IFX provide information about the potential ICERs if the list price of any of these biologics were to be revised in the future.

Fig. 2
figure 2

Deterministic sensitivity analysis VDZ vs. ADA. ADA adalimumab, ICER incremental cost-effectiveness ratio, RR relative risk, UC ulcerative colitis, VDZ vedolizumab

Fig. 3
figure 3

Deterministic sensitivity analysis VDZ vs. IFX. ICER incremental cost-effectiveness ratio, IFX infliximab, RR relative risk, VDZ vedolizumab. *In this scenario, IFX was dominant (i.e., IFX was more effective and less costly compared with VDZ); ICER not shown. In these scenarios, VDZ was dominant (i.e., VDZ was less costly and more effective compared to IFX); ICER not shown. In these scenarios, VDZ was less costly and less effective compared to IFX; ICER not shown

Detailed results of the scenarios for VDZ versus ADA and IFX in which the ICER was above ¥5,000,000 per QALY gained are presented in Appendix E in the ESM, Online Resource 1.

3.3 Probabilistic Sensitivity Analysis

Assuming a WTP threshold of ¥5,000,000 per QALY gained, the probability that VDZ produced the greatest expected net benefit was the highest compared to ADA, GOL, and IFX. VDZ was the optimal treatment choice at any WTP threshold > ¥4,500,000 per QALY gained (Fig. 4).

Fig. 4
figure 4

Multi-way cost-effectiveness acceptability curves. ADA adalimumab, GOL golimumab, IFX infliximab, QALY quality-adjusted life-year, VDZ vedolizumab; WTP willingness-to-pay

4 Discussion

To our knowledge, this is the first study to assess the cost-effectiveness of biologics for the treatment of anti-TNF-naïve patients with moderate-to-severe active UC from Japan’s public healthcare payer perspective. Findings from this cost-effectiveness analysis can further inform patients and prescribers when considering VDZ over other biologics (ADA, GOL, and IFX), and can be useful for decision makers at the time of repricing biologics. This analysis suggests that long-term treatment with VDZ is associated with clinical benefits over other biologics, including additional time spent in response and remission, fewer surgeries, and more QALYs gained. These benefits were obtained at a lower total cost compared with GOL. VDZ resulted in higher total costs compared with ADA and IFX; however, the resulting ICERs fell below the WTP threshold of ¥5,000,000 per QALY gained, making VDZ cost-effective compared with ADA and IFX. Both DSA and PSA consistently supported the robustness of the findings in the base-case analysis, indicating that VDZ was either dominant or cost-effective in most scenarios and replications.

Our findings are consistent with two recent cost-effectiveness analyses in the UK and Spain that examined VDZ compared with ADA, GOL, and IFX for the treatment of patients with moderate-to-severe UC who were anti-TNF-naïve [18, 19]. Both studies also showed that long-term treatment with VDZ was associated with greater QALYs compared with ADA, GOL, and IFX [18, 19]. In the UK analysis, the time horizon was 30 years as a proxy for lifetime, and VDZ was dominant over GOL and IFX and cost-effective compared with ADA [18]. The analysis from Spain used a time horizon of 10 years [19]. This short time horizon made it difficult to directly compare with our cost-effectiveness results [19]. However, when we set the time horizon to 10 years, VDZ remained dominant compared with GOL and cost-effective compared with ADA and IFX, with ICERs lower than with a lifetime horizon in the base case (¥4,089,246 and ¥3,577,489, respectively). A third study assessed the cost-effectiveness of treatment strategies/sequences of treatments for the treatment of moderate-to-severe UC patients who were anti-TNF-naïve in the UK and China [47]. Consistent with our study and the previous studies in the UK and Spain, long-term treatment with VDZ was associated with greater QALYs compared with ADA, GOL, and IFX when all biologics were followed by CT after biologic discontinuation. In our study, the total QALYs for the biologics ranged between 13.07 and 13.43. The study conducted for Spain reported lower QALYs for the biologics, between 5.07 and 6.00, as the time horizon of analysis was 10 years [19]. In the study by Wilson et al. (2018) for the UK, the time horizon of analysis was lifetime and the QALYs for the biologics ranged between 13.79 and 14.08; in this study by Wilson et al., the health states were not based on response and remission, but on disease severity (moderate-severe, mild, and remission) [18]. The utility value used for patients in remission was 0.87, in line with our analysis and the other studies for the UK, Spain and China [19, 46, 47]; however, the utility values used by Wilson et al., for the mild and moderate-severe UC health states were 0.80 and 0.68, respectively, which are higher than those for the response-only (0.76) and active UC (0.41 or 0.42) health states used in our analysis and the other studies [19, 46, 47]. Although Wilson et al. used lower utility values compared to our analysis for the surgery, post-surgery remission, and post-surgery complications health states (0.42 vs. 0.66, 0.60 vs. 0.71, and 0.42 vs. 0.66, respectively) [18], the low number of surgeries and complications does not offset the additional QALYs accrued with a higher utility value for the mild and moderate-severe UC health states in the study by Wilson et al. versus those used in our analysis for the response-only and active UC health states. In the study by Wu et al. (2018), the time horizon of analysis was lifetime and the QALYs for biologics (ADA, GOL, IFX, and VDZ) followed by CT after discontinuation ranged between 10.71 and 11.48 in the analysis for the UK, and between 8.16 and 8.92 in the analysis for China [47]. All the efficacy and utility parameters used in the analyses by Wu et al. were the same for the UK and China, except for the life expectancy tables, which were country specific; the difference in the life expectancy seems to be the only explanation for having different QALYs between the analyses for the UK and China, although life-years are not reported in the study to confirm this. In our study, we used Japanese life tables published by the MHLW to model mortality, which could explain the higher QALYs for biologics reported in our study as the life expectancy in Japan (84 years) is higher than that in the UK (81 years) and China (76 years) [57]. The utility values used in our analysis were almost the same as those used by Wu et al. [47], except for the surgery health states remission (0.88 in the study by Wu et al. vs. 0.87 in our analysis), response-only (0.76 in both analyses), active UC (0.42 vs. 0.41), post-surgery remission (0.60 vs. 0.71), and post-surgery complications (0.42 vs. 0.66). We ran a scenario in our model using the utility values used by Wu et al., and VDZ remained dominant compared with GOL and cost-effective compared with ADA and IFX, with ICERs lower than in the base case (¥4,358,589 and ¥4,065,464, respectively).

Our model attempts to overcome limitations associated with previous cost-effectiveness analyses of biologics in the treatment of moderate-to-severe UC. First, NMAs are a common and accepted approach to assess the relative efficacy of multiple treatments in the absence of head-to-head RCTs [58, 59]. For this analysis, we conducted an SLR and NMA, including evidence from a phase III study in Japanese patients with moderate-to-severe UC who are anti-TNF-naïve [16, 22]. Due to differences in RCT designs (e.g., treat-through versus re-randomization), imputation of treat-through studies was required to allow for indirect comparisons of ADA and IFX for the maintenance phase. We developed a novel method for imputation of treat-through design to re-randomization that offers several advantages over prior approaches to avoid biased estimates against VDZ’s comparators and to better mimic current approaches to treatment wherein induction responders remain on therapy. To assess the impact of our approach on the cost-effectiveness results, we ran a scenario using the results of a previous NMA by Vickers et al. (2016), in which ORs for VDZ, ADA, GOL, and IFX versus placebo were reported for the four outcomes of interest in patients with moderate-to-severe UC who are anti-TNF-naïve [32]. As in our base-case analysis, we transformed the ORs reported by Vickers et al. into RRs using the placebo arm of the Japanese phase III VDZ study as the reference. In this scenario, VDZ remained dominant compared with GOL and cost-effective compared with ADA and IFX, with ICERs lower than in our base case (¥4,483,564 and ¥3,058,130, respectively). The ICERs are lower in the scenario using the ORs reported by Vickers et al., because all the clinical outcomes (time on biologic treatment, time in response and remission, and consequently QALYs) improved for VDZ, while they worsened for all the other comparators. It is important to note when comparing our findings with those of the NMA by Vickers et al. that differences in the assumptions and imputation approaches are not the only factors driving differences in the observed results of the NMA and, therefore, the scenarios ran for the cost-effectiveness analysis; our NMA included nine and five additional RCTs reporting the outcomes of interest for induction and maintenance, respectively, including the Japanese phase III VDZ study. However, the approach used by Vickers et al. to impute treat-through data and allow for comparison to re-randomization data includes a strong bias in favor of placebo, as they assumed that the number of placebo responders during induction would be the same as the number of treatment responders during induction; this inflated the estimate of durable response and remission during maintenance for placebo and resulted in under-estimation of the efficacy of ADA and IFX compared to placebo, which contributes to the lower ICERs in the cost-effectiveness analyses for VDZ versus ADA and IFX.

Second, previous cost-effectiveness analyses have characterized disease severity during the maintenance phase using three health states, defined by patients’ Mayo score: remission (Mayo score 0–2), mild UC (Mayo score 3–5), and moderate-to-severe UC (Mayo score 6–12) [17, 18, 60]. This approach has been criticized by an independent Evidence Review Group (ERG) during a NICE single technology appraisal of VDZ for the treatment of patients with moderate-to-severe UC in the UK, as this approach ignored that patients in the mild and moderate-to-severe UC health states can include both patients with and without response [61]. The ERG suggested that this issue could have been addressed by modeling the maintenance phase transitions between moderate-to-severe UC, response, and remission health states, using the patient-level data from the GEMINI I clinical trial [15, 61]. We followed the ERG’s recommendation, and the model structure is in line with previous models [62]. However, unlike in previous studies, we did not have access to the individual patient data from any RCT of the comparators included in this analysis to derive transition probabilities between the maintenance phase health states. Only with access to patient-level data is it possible to estimate all the transitions between the maintenance phase health states: remission to remission, remission to response-only, remission to active UC, response-only to response-only, response-only to remission, response-only to active UC, active UC to active UC. For example, in the study by Tappenden et al. (2016), data relating to response and remission for the comparators included in their analysis were obtained directly from the manufacturer of the biologics, which allowed them to derive all the transition probabilities between the maintenance phase health states for all the comparators [46]. However, most studies will typically have access to the individual patient data from RCTs of one, or at most two, comparators of interest for which all the transition probabilities between the maintenance phase health states can be derived, but meta-analyses and indirect comparisons (e.g., NMA) of relevant RCTs are used to derive the efficacy parameters of the remaining comparators of interest. Outcomes from maintenance RCTs for which individual patient data are not available are usually presented in terms of durable response and remission, and sometimes durable remission. Therefore, an NMA cannot provide information on the relative efficacy of comparators for certain transition probabilities (e.g., remission to response-only, response-only to remission, response-only to active UC). Previous models have not provided a clear explanation regarding how three potential outcomes from an NMA have been used to modify/adjust five to seven transition probabilities, derived from patient-level data from one of a few RCTs of the comparator of interest, to account for the treatment effects.

With our modeling approach, we can model the transitions between the maintenance phase health states directly using the evidence reported from RCTs and applying the comparative efficacy outcomes that can be obtained from an NMA (durable response and remission), without requiring individual patient data from RCTs to estimate all the possible transitions during the maintenance phase. At the beginning of each cycle, the number of patients in the remission and response-only health states are known. The model then uses the probability of durable response to calculate the total number of patients that remains in these two health states (remission and response-only) and the probability of remission to determine the number of patients that would be allocated to the remission health state at the end of each cycle. The difference in number of patients between those who experienced durable response and remission are allocated to the response-only health state; this allocation implies that some patients remain in the response-only health state, others improve slightly from response-only to remission, and others worsen slightly from remission to response-only. Patients who do not experience durable response are assumed to lose their remission or response-only status and transition to the active UC health state. The limitation of this approach is that the cycle length of the maintenance phase has to be in line with the follow-up time at which the durable response and remission outcomes are reported from the RCTs.

4.1 Limitations

First, some RCTs included in the NMA were international, and the comparative efficacy and safety estimates from these RCTs were not specific to the Japanese population. In addition, due to a lack of available quality-of-life data for Japanese (or even Asian) patients with UC, health-state utility estimates were obtained from UK population studies. Second, the RCTs used to inform the comparative efficacy and the rates of AEs for biologics had an approximate follow-up duration of 1 year. It was assumed that the treatment effect was constant if patients remained on treatment and that transition probabilities and the incidence of AEs beyond the first year of treatment during the maintenance phase were the same as those estimated for the first year. As this analysis considered a lifetime horizon, the extrapolation of the treatment efficacy and safety after 1 year is subject to uncertainty. Third, 100% adherence to biologics was assumed (i.e., drug holidays or dose intensity were not modeled), which may have resulted in an overestimation of drug costs over the model time horizon; however, this impacts all biologics, potentially cancelling out any impact in favor of a specific comparator. Fourth, the impact of subsequent biologic treatment following failure of first-line biologic was not considered. After failure of the first-line biologic, patients are assumed to switch to CT, which was modeled using the placebo arm of the Japanese phase III VDZ study for patients who were anti-TNF-experienced. This assumption implies that the efficacy of CT for patients who were anti-TNF-experienced is the same after failure with all biologics, including VDZ. We decided to make this assumption and excluded subsequent lines of biologic treatment as including this feature could mask the true value of a specific first-line biologic, as benefits could result from other subsequent treatments. However, future analyses should examine the cost-effectiveness of different treatment sequences, to obtain information regarding the cost-effectiveness of specific sequences of biologics, and of individual biologics as part of a sequence of treatments and where they should be placed to maximize health economic value; the study by Wu et al., can provide a starting point for the future analyses [47]. Fifth, in this analysis we focused on comparing VDZ with other biologics in anti-TNF-naïve patients with moderate-to-severe UC. TOF was not included as a comparator in our analysis for the reasons stated in the “Treatment Comparators” section. If information regarding the efficacy of TOF for the treatment of anti-TNF-naïve patients with moderate-to-severe UC who have not sufficiently responded to CT becomes available, TOF should be included as a comparator in future analyses. Finally, IFX biosimilar was not considered a relevant comparator in our analysis for the reasons stated in the “Treatment Comparators” section. The list price of IFX biosimilar in Japan is ¥50,042 for 100 mg, 62% of the price of the branded IFX (¥80,426 for 100 mg). If we assume the same efficacy of the branded IFX for IFX biosimilar, IFX biosimilar would be dominant compared to ADA, GOL, and the branded IFX. VDZ would be associated with more time spent on treatment, response and remission, fewer surgeries and greater QALYs. Disease management and surgery-related costs would be also lower for patients treated with VDZ; however, inevitably, due to the lower drug acquisition cost of IFX biosimilar, its cost of treatment would be lower, as well as the total cost, resulting in an ICER above ¥5,000,000 per QALY gained for VDZ versus IFX biosimilar.

5 Conclusion

This analysis suggests that VDZ can save costs and is more effective compared with GOL, and is cost-effective compared with ADA and branded IFX for the first-line biologic treatment of patients with moderate-to-severe active UC who have not had an adequate response with, have lost response to, or are intolerant to a CT in Japan.