Plain Language Summary

Acute lymphoblastic leukemia (ALL) is a rare blood cancer typically diagnosed in children or adults over 50, with a high rate of recurrence. While initial chemotherapy regimens can bring about complete remission in high proportions of adults with ALL, the disease will recur in many; this is called “relapsed” or “refractory” (R/R) ALL. With each recurrence, treatments become less effective. The best long-term survival option for patients in remission is transplantation of the stem cells that produce our blood components (hematopoietic stem cell transplantation, or HSCT). But not everyone can tolerate HSCT, particularly older, frailer patients, and ALL can recur yet again. The 5-year survival among patients with R/R ALL is only 10%.

Two immunotherapy drugs were US-approved for second-line or later treatment of R/R ALL, inotuzumab ozogamicin (InO) and blinatumomab (Blina). Their individual efficacy was demonstrated in clinical trials, but understanding their comparative efficacy is important for medical and economic decision-makers. Since no direct comparison of the two via clinical trial has been done yet, we used indirect treatment comparison (ITC) methods to assess their relative efficacy.

We conducted several types of ITCs (network meta-analyses [NMA], matching-adjusted indirect comparisons [MAIC], and simulated treatment comparisons [STC]), using data from the two clinical trials, INO-VATE-ALL for InO and TOWER for Blina. The ITC results indicated higher rates of remission and of HSCT for InO over Blina, a trend favoring InO for event-free survival (EFS), and no difference between them in overall survival (OS).


Acute lymphoblastic leukemia (ALL) is a rare, heterogeneous, hematologic disease resulting from malignant transformation and proliferation of progenitor lymphoid cells [1, 2]. The disease is characterized by an accumulation of lymphoblasts in the bone marrow, peripheral blood, and other organs [1,2,3]. In adults with ALL, B cell lineage represents approximately 75% of cases, with the remaining cases being T cell lineage [1]. Precursor B cell ALL is usually associated with the expression of CD10, CD19, CD22, CD34, and CD79a on the cell surface [1, 3].

In the USA, the age-adjusted incidence rate for ALL is 1.58 per 100,000 individuals per year [1]. For 2018, it was estimated that 5920 new cases were diagnosed and 1470 deaths due to the disease were observed in the USA [4]. Diagnosis of ALL generally occurs either during childhood or later in adulthood, after 50 years of age [2].

Although ALL is the most common form of pediatric acute leukemia, the disease accounts for 20% of leukemias in adults and is particularly devastating in this population [1, 2, 5]. In adults, approximately 80–90% of patients will achieve a complete response with initial therapy; however, most will eventually relapse, with worse outcomes observed in older adults [6]. After relapse, response rates decrease, particularly for patients whose first remission was short. The 5-year survival among patients with relapsed or refractory (R/R) ALL is only 10%.

The foundation of treatment includes systemically administered combination chemotherapy [2]. Induction, consolidation, and outpatient maintenance comprise the treatment phases of ALL, with central nervous system prophylaxis administered during periods of each phase. The goal of induction therapy is to achieve complete remission, after which patients may undergo allogeneic hematopoietic stem cell transplantation (HSCT) or progress to the consolidation and maintenance phases. For adult patients with R/R ALL, HSCT offers the best option for long-term survival; however, prior to HSCT, a complete response to therapy is typically required, which is achieved by only approximately 40% of patients after the first salvage therapy [2, 6,7,8]. The introduction of novel therapies including immunotherapies as salvage therapy have offered the potential for long-term survival in these patients. Immunotherapies include monoclonal antibodies, conjugated monoclonal antibodies, bispecific T cell engagers, and chimeric antigen receptor T cell therapies [6].

Inotuzumab ozogamicin (Pfizer, Philadelphia, PA, USA) and blinatumomab (Amgen, Thousand Oaks, CA, USA) are both approved for the treatment of adults with R/R B cell precursor ALL [9, 10]. Blinatumomab (Blina), a bispecific T cell engager, binds CD19 expressed on the surface of B-lineage cells to CD3 on cytotoxic lymphocytes resulting in CD19-mediated cell death [6]. Inotuzumab ozogamicin (InO), a conjugate monoclonal antibody, is composed of a monoclonal antibody targeting CD22 covalently linked with the cytotoxic agent calicheamicin [6, 10]. After binding to the CD22 antigen on B cells, the CD22–conjugate complex is internalized where the cytotoxic agent is released, thus leading to apoptosis. The efficacy of both Blina and InO were demonstrated in their respective phase III studies [11, 12].

Evidence that compares the efficacy of InO with that of Blina is important for clinical and economic decision-making. To date, there have been no direct comparative studies; therefore, indirect treatment comparisons (ITCs) are needed to assess the relative efficacy of these therapies. Network meta-analysis (NMA) is an ITC method that is commonly used to estimate relative treatment effects from clinical trials that have a common comparator and patient populations that are homogeneous [13, 14]. When the study populations involved in the comparisons are heterogeneous, as is the case with INO-VATE-ALL (NCT01564784) and TOWER (NCT02013167), alternative methods can be used to adjust for imbalances in risk factors that are suspected treatment-effect modifiers before estimating relative treatment effects [14, 15]. These methods include anchored matching-adjusted indirect comparison (MAIC) and simulated treatment comparison (STC), which have been recognized by health technology assessment authorities [16].

The primary objective of this analysis was to indirectly compare the efficacy of InO with that of Blina among adult patients with R/R ALL using data from the INO-VATE-ALL and TOWER studies. Outcomes evaluated were complete remission or complete remission with incomplete hematologic recovery (CR/CRi), HSCT, overall survival (OS), and event-free survival (EFS).


Data Sources

The efficacies of InO and Blina in adult patients with R/R ALL were proven in two separate phase III, randomized, open-label studies [11, 12]. INO-VATE-ALL compared InO and standard of care (SoC) chemotherapy [12]. Blinatumomab was compared with SoC chemotherapy in the TOWER study [11, 17]. The proposed ITCs used individual patient-level data from INO-VATE-ALL (cutoff date of January 4, 2017) and published summary data from TOWER.

No institutional board review was required for this study as it was based on a post hoc analysis of previously published data from the INO-VATE-ALL and TOWER trials. These previous studies involved human participants and were conducted in accordance with the ethical standards of the institutional and/or national research committees of each study’s investigative sites, and with the 1964 Helsinki declaration and its later amendments, or comparable ethical standards. Informed consent was obtained from all individual participants included in these previous trials.

Study Compatibility Assessment

Compatibility of the INO-VATE-ALL and TOWER studies for ITC analyses was assessed by comparing study designs, patient populations, and outcomes definitions of the two studies (Supplementary Table 1). Differences expected to potentially impact the results were adjusted in the analyses where possible.

Inclusion/Exclusion Criteria

Many of the key inclusion and exclusion criteria were similar; however, there were also some considerable differences between the two studies. First, INO-VATE-ALL enrolled approximately 15% of patients with Philadelphia chromosome-positive (Ph+) precursor B cell ALL; in contrast, TOWER included only patients with Ph-negative (Ph−) precursor B cell ALL. Only Ph− patients from INO-VATE-ALL were included in the MAIC and STC analyses. Unlike INO-VATE-ALL, patients with high peripheral blasts (> 10,000/μL) at baseline were eligible for enrollment in the TOWER study. There was no limit on the number of salvage therapies in TOWER (23% of patients had three or more), while in INO-VATE-ALL only patients with one or two salvage therapies were enrolled.

Comparator Arm in INO-VATE-ALL and TOWER

The composition of the comparator arms for both studies was not identical; however, the chemotherapy regimens were sufficiently similar to expect comparable efficacy results. Three different chemotherapy regimens were permitted in the INO-VATE-ALL study, while four different types of chemotherapy regimens were allowed in the TOWER study. The most commonly used SoC regimen in the INO-VATE-ALL study was the combination of fludarabine, high-dose cytarabine, and granulocyte colony-stimulating factor (FLAG)-based chemotherapy, whereas the most commonly used SoC regimen in the TOWER study was FLAG chemotherapy with or without an anthracycline.

Outcome Definitions

The main outcomes of interest for this analysis were remission rate, HSCT rate, OS, and EFS. Although complete response with partial hematologic recovery (CRh) was not reported separately in INO-VATE-ALL, it was included as part of the CRi endpoint. Therefore, for remission rate analyses, the proportion of patients who achieved CR/CRi in INO-VATE-ALL was compared with the proportion of patients who achieved CR/CRi/CRh in TOWER. CR, CRi, and CRh in TOWER had to occur within 12 weeks of the first dose of therapy. Although this restriction was not employed in INO-VATE-ALL, all patients in this study achieved CR/CRi within 3.3 months of the start of treatment. All patients who received a transplant regardless of treatment response or timing were considered in the HSCT rate analysis.

Statistical Analysis

Network Meta-analysis

A standard pairwise Bucher method [18] was used to estimate an ITC between InO and Blina, using the relative effect for InO versus SoC in INO-VATE-ALL and the published relative effects for Blina versus SoC in TOWER derived from intention-to-treat (ITT) comparisons.


For each outcome, likely treatment-effect modifiers were identified from the literature, stratified analyses from both the INO-VATE-ALL and TOWER studies, and clinical experts’ input. Table 1 summarizes the treatment-effect modifiers used for the anchored MAIC/STC analyses. Duration of the first remission is a stronger predictor and treatment-effect modifier among patients with only one salvage therapy than in patients in a second or later salvage phase [19]. Therefore, an adjustment in the analyses was made for the proportions of patients in each salvage treatment phase (first, or second or later) combined with the duration of the first remission (less than 12 months or equal to/greater than 12 months).

Table 1 Treatment-effect modifiers selected for analysis

The anchored MAIC technique [19] balances differences in potential treatment-effect modifiers through propensity score re-weighting of patients from INO-VATE-ALL to produce a patient profile matching that of TOWER. Estimates of relative effect (i.e., InO versus SoC in a TOWER-like population) were then derived for outcomes of interest using the re-weighted population. For rates of remission and HSCT, the relative effects of InO versus SoC in the TOWER-like population were quantified using an odds ratio (OR) and 95% confidence interval (CI) derived from a weighted logistic regression analysis. For OS and EFS, the relative effect was quantified as a hazard ratio (HR) with a 95% CI derived from a weighted Cox regression analysis (unstratified). To account for potential violation of the proportional hazard assumption due to differences in short- and long-term performance against SoC, as shown in the published OS and EFS curves in both studies, time-dependent Cox regression [20] and restricted mean survival time (RMST) [21] approaches were also performed to quantify differences in OS and EFS. Data from published OS and EFS Kaplan–Meier curves for the Blina and SoC chemotherapy arms for the ITT population were extracted and digitized, and the Guyot method [22] was used to derive virtual patient-level data to calculate RMST and perform time-dependent analyses on OS in TOWER.

In the anchored STC [15], patient-level data from INO-VATE-ALL were used to create a separate predictive equation for each outcome of interest, which were then used to estimate the relative treatment effect of InO versus SoC in a TOWER-like population. All treatment-effect modifiers were included in the regression equations. Logistic regression models were used for rates of remission and HSCT and the relative effect was quantified as an OR with a 95% CI. Results were also presented as differences in rates with an approximate 95% CI. Cox proportional hazard models were used for OS and EFS, and the relative effect was quantified as an HR with a 95% CI. As a result of the non-proportional treatment effect over time, time-dependent treatment effects on OS for InO versus SoC were also derived.

Treatment effects for InO versus Blina were derived using the Bucher method, which compared adjusted relative effects derived from anchored MAIC or STC analyses of InO versus SoC in INO-VATE-ALL and published (or obtained using virtual patient-level data) relative effects for Blina versus SoC in TOWER.


Matching Patient Baseline Characteristics

Patient baseline characteristics for the ITT population of both studies are presented in Table 2. Before matching, there were considerable differences between populations in Ph chromosome status, number of previous salvage therapies, duration of first remission, age, geographic region, and history of HSCT.

Table 2 Patient baseline characteristics

Once the weights were applied, all proportions for the set of patient characteristics included in the matching were similar between the two study populations. However, it was not possible to fully account for differences in the number of salvage treatment phases, because the TOWER study placed no limit on the number of salvage therapies (23% of patients had three or more), whereas the INO-VATE-ALL study enrolled only patients with no more than two salvage therapies. Thus, the matched proportion of patients with two or more salvage treatment phases in INO-VATE-ALL includes only patients with two salvage treatment phases, whereas in TOWER it also includes patients who had three or more. The INO-VATE-ALL sample size after matching (i.e., effective sample size) was reduced by approximately 50% or more for all outcomes, except HSCT rate (Table 3; Supplementary Table 2).

Table 3 Efficacy outcomes

Indirect Treatment Comparisons

In comparison to the results obtained using NMA, anchored MAIC and STC results indicated stronger treatment effects for InO relative to Blina for most of the outcomes (Table 3; Supplementary Table 2). As shown in Table 3, the odds of remission were statistically significantly greater with InO when compared with Blina, regardless of the ITC method used (OR [95% CI] NMA 2.63 [1.35, 5.12]; MAIC 2.81 [1.12, 7.05]; STC 3.91 [1.53, 9.99]). When comparing HSCT rates between InO and Blina, results indicated a statistically significantly higher rate among patients who received InO (OR [95% CI] NMA 3.23 [1.63, 6.40]; MAIC 4.11 [1.85, 9.12]; STC 3.77 [1.71, 8.35]). Similarly, when treatment effects were estimated in terms of rate difference, results also indicated statistically significantly higher remission and HSCT rates for InO versus Blina (remission rate difference [95% CI] NMA 23.64 [10.10, 37.20]; MAIC 25.12 [6.60, 43.70]; STC 31.44 [13.80, 49.10] and HSCT rate difference [95% CI] NMA 25.85 [12.50, 39.20]; MAIC 31.03 [15.50, 46.50]; STC 29.33 [13.70, 44.90]).

For EFS, the ITC analyses indicated a favorable trend for InO compared with Blina (Supplementary Table 2; Fig. 1; see TOWER study Fig. 1c [11]). With the MAIC adjustment, results showed statistically significantly higher RMST differences and ratios for InO compared with Blina, suggesting a longer mean EFS for patients who received InO. Results were not statistically significant when the EFS treatment effect was quantified using an HR.

Fig. 1
figure 1

Event-free survival Kaplan–Meier curves before matching (a) and after matching (b). a INO-VATE-ALL before matching. b INO-VATE-ALL after matching. EFS event-free survival, InO inotuzumab ozogamicin, SoC standard of care

The OS curves appeared to depart from the proportional hazards assumption (Fig. 2), proportionality being required in standard Cox regression analyses. Therefore, the OS HRs should be interpreted with caution (Supplementary Table 2). For OS, overall and time-dependent adjusted HRs (95% CIs) and RMST difference and ratio from anchored MAIC and STC analyses were consistent and revealed no statistically significant difference in OS between InO and Blina (see TOWER study Fig. 1a [11]).

Fig. 2
figure 2

Overall survival Kaplan–Meier curve before matching (a) and after matching (b). a INO-VATE-ALL before matching. b INO-VATE-ALL after matching. InO inotuzumab ozogamicin, OS overall survival, SoC standard of care


The aim of this study was to indirectly compare the treatment effects between InO and Blina because of the absence of a head-to-head comparison of these two drugs in adult R/R ALL. Results presented here indicated that remission and HSCT rates were significantly higher for InO compared to Blina regardless of the ITC method applied. After adjustment using anchored MAIC and STC analyses, the estimated treatment effect became stronger with InO than when using an unadjusted naïve analysis. Results from the anchored MAIC and STC approaches were generally consistent. Results also suggested longer EFS with InO than Blina. Although a statistically significant difference was not observed when EFS was quantified using HRs, RMST differences and ratios from the MAIC analysis significantly favored InO over Blina.

Results from both the overall and time-dependent analyses indicated that there was no significant difference between InO and Blina in OS. Given the limited follow-up data from TOWER (due to the study meeting its early stopping criteria at month 20), relative effects could not be reliably estimated after 15 months of follow-up; this should be considered when interpreting the results of this analysis. In INO-VATE-ALL, the separation in OS curves between InO and SoC accelerated at 15 months, indicating longer-term survival benefit, which is consistent with the higher HSCT rates that were observed among patients in the InO arm. For the TOWER study, the OS curves for Blina and SoC converged at about this same time point for the ITT population, which is consistent with the identical HSCT rates observed among patients in the Blina and SoC arms. In light of these observations, it is possible that the findings from the ITC may underestimate the relative OS for InO.

Recently, anchored MAIC analyses comparing Blina to InO were conducted and published by Song et al. 2019 for OS and CR rate [23]. This study found no difference in CR rates; furthermore, mean restricted survival for Blina was 1.6 months (95% CI [0.1, 3.2]; p < 0.05) longer than for InO when the authors applied the RMST method using only 12 months of OS data. However, when the entire follow-up period was considered, no statistically significant RMST difference in OS was found between InO and Blina. Authors used patient-level data from the TOWER study and published data from the INO-VATE-ALL study. Since the TOWER study enrolled only Ph− precursor B cell ALL patients, it was not possible to adjust for this in the analyses conducted by Song et al. and therefore reported results could potentially be biased in favor of Blina.

Alternatively, authors could fully adjust for the number of salvage therapies by excluding patients who had received three or more therapies; this was not possible in our analyses. The HSCT or CRi/CRh remission rates were not compared, despite remission rate being the primary endpoint in both studies. As discussed by Song et al., the definitions of CRi/CRh between the TOWER and INO-VATE-ALL studies were not identical. However, all our ITC analyses were anchored; therefore, some differences in definition are unlikely to impact the relative effects within the study or change the results for the derived treatment effects of InO versus Blina. The CRi/CRh rates are important to include as they also enable patients to proceed to HSCT and benefit from long-term survival and potential cure.

The evidence-based healthcare decision-making for clinical treatment guidelines and reimbursement policies requires comparisons of all relevant competing interventions. Direct comparative studies are rarely available in the time period immediately following approval of a treatment since the primary goal of clinical development is to meet regulatory requirements [14, 15]. In the absence of randomized controlled trials involving a direct comparison of all treatments of interest, ITCs provide useful evidence for judiciously selecting the best choice(s) of treatment [24].

The indirect comparisons made here are subject to limitations. NMA and anchored MAIC and STC analyses rely on a common comparator arm. The SoC intensive chemotherapy regimens used as the comparator arms for INO-VATE-ALL and TOWER were not identical. Since there are no randomized clinical studies that compared those regimens head-to-head, it is possible that some SoC regimens are slightly more efficacious than others; however, it is important to note that in both studies, the remission rate with SoC was low, as is consistent with historical data.

In our analyses, the SoC regimens were considered similar in terms of efficacy. MAIC and STC analyses depend on all treatment-effect modifiers being adjusted for in the analyses. Although we adjusted for most of the treatment-effect modifiers identified in the analyses, some were either not or only partially adjusted, such as the number of prior salvage therapies and high peripheral blasts. Thus, treatment effect for InO over Blina may have been overestimated. The effective sample size in MAIC analyses was reduced by 50% or more for most of the outcomes evaluated, which led to wider 95% CIs and an increased uncertainty in the results.


The results of this adjusted ITC showed a statistically significant advantage for InO compared with Blina on improving rates of remission and HSCT. Further head-to-head randomized controlled studies of InO versus Blina are needed to confirm the findings presented in this manuscript.