Background

The past two decades witnessed major advances in treatment of multiple myeloma (MM), including introduction of high-dose therapy (HDT) (chemotherapy or chemoradiotherapy), autologous hematopoietic cell transplantation (auto-HCT), and other effective therapies including immunomodulatory drugs or proteasome inhibitors, namely bortezomib [15]. These new chemotherapeutic agents when used in combinations, have led to improvement in survival and a higher frequency and better quality of response; but have not translated into cure of this disease [3, 4].

The concept of ″total therapy″ treatment approach for patients with newly diagnosed MM, using multi-agent induction regimens, tandem auto-auto HCT, and post-transplantation maintenance resulted in progressive increase in proportion of patients achieving complete remission (CR) [6]. The Intergroupe Francophone du Myelome (IFM) demonstrated that tandem auto-auto HCT improves overall survival (OS) among patients with myeloma, particularly if a very good partial response (VGPR) is not achieved after undergoing the first auto-HCT [7]. A meta-analysis by our group showed that tandem auto-auto HCT versus single auto-HCT in previously untreated MM results in improved response rates, but not improved OS [8].

Badros et al. demonstrated the feasibility of offering reduced-intensity conditioning (RIC) allogeneic (allo)-HCT as a salvage strategy in 31 patients with relapsed MM [9]. Seventeen (55%) of 31 cases had received at least two auto-HCT and 17 (55%) had progressive disease at time of allografting [9]. Despite these adverse clinical features, 19 (61%) patients achieved CR or a near CR, with the 100-day and overall non-relapse mortality (NRM) of 10% and 29%, respectively [9]. This suggests a beneficial graft-versus-myeloma (GVM) effect mediated by alloreactive donor T-cells is capable of disease control, even in MM refractory to HDT. Gahrton et al. compared outcomes of patients who received allo-HCT for relapsed MM during 1983–1993 and 1994–1998 showing improvement in NRM and OS for patients allografted during the later time period [10]. The authors speculate that earlier time to allografting (10 months versus 14 months), for patients transplanted during the later time period, probably contributed to this beneficial effect [10]. Similar results were recently reported by Kumar et al., where 1 year OS post allo-HCT improved in three successive eras (1989–1994, 1995–2000, and 2001–2005) and increased interval between time of MM diagnosis and allografting was found to be an independent adverse prognostic factor for OS [11].

Combining benefits of cytoreductive-therapy from HDT and auto-HCT with adoptive immunotherapy (from allo-HCT) forms the basis of auto-allo HCT treatment strategy in patients with MM. Conflicting results, however, have been noted when an auto-allo HCT approach has been compared to an auto-auto HCT strategy. A recent systematic review on the same issue was performed by Armeson et al. [12] However, this systematic review is limited by inclusion of an inappropriate study, in our opinion. That is, this systematic review included the study by Garban et al. which was not a true randomized controlled trial but rather represents comparisons from two parallel trials (IFM99-03 and IFM99-04) that enrolled allograft and autograft recipients separately. Most importantly, the systematic review by Armeson et al. did not attempt to evaluate the methodological quality of included studies, which is the one of the key reasons to conduct a systematic review. Assessment of risk of bias in the systematic review process provides explanations on whether the observed findings are indeed the effect of the intervention or as a result of bias. Accordingly, we performed a systematic review of published studies comparing auto-auto HCT with auto-allo HCT in patients with newly diagnosed MM that addresses all the issues that were not addressed in the systematic review by Armeson et al.

Results

Initial search yielded 152 references and 2 abstracts, of which 149 were excluded for various reasons as shown in Figure 1. Five studies (four full-manuscripts and one abstract) enrolling a total of 1538 patients were eligible for inclusion into this meta-analysis [1317]. In one case [15], we identified a complementary publication [18] which provided longer follow-up on the originally published study. Additionally, we excluded one manuscript [19] because it was an indirect comparison (i.e. patients were enrolled separately into two parallel trials, IFM99-03 and IFM99-04, with different primary endpoints and subsequently compared to each other). Finally, we excluded one abstract, HOVON50/54, because patients on the control arm received only a single auto HCT [20].

Figure 1
figure 1

Flow-diagram depicting the identification and selection of eligible studies for inclusion in the systematic review.

Patient, disease and treatment characteristics

Table 1 summarizes extracted data pertinent to patients′ disease and treatment characteristics. All studies allocated patients to auto-allo HCT if an HLA-matched sibling donor was available, except one [16] where matched volunteer unrelated donors were permitted. For patients undergoing tandem auto-auto HCT, high-dose melphalan 200 mg/m2 (MEL200) was the preferred regimen for the first autograft in three studies [13, 14, 16], melphalan dose ranging from 100 to 200 mg/m2 was used in one study [15], and melphalan dose ranging from 140 (with total body irradiation) to 200 mg/m2 was used in another study [17]. For the second autograft, MEL200 was the preferred regimen in two studies [13, 16]. In the study by Bruno et al. patients were offered a dose of melphalan ranging from 100 to 200 mg/m2[15], whereas Rosiñol et al. allowed MEL200 or a combination of cyclophosphamide, etoposide, BCNU [17]. Moreover, Björkstrand et al. provided patients the option to undergo a second autograft using MEL200 or not to undergo a second autograft [14]. For the purpose of this meta-analysis, only patients who received a second autograft were included in analysis.

Table 1 Characteristics of biologically randomized studies in tandem autologous versus autologous-allogeneic hematopoietic cell transplantation for patients with multiple myeloma

For patients who received an auto-allo HCT approach, MEL200 was the preferred regimen for autografting in four studies [1316]. RIC regimen of 2 Gy TBI was the preparative regimen in two studies [13, 15]. Bjorkstrand et al. combined fludarabine with 2 Gy TBI [14], while the two remaining studies employed a RIC regimen with fludarabine/melphalan for allo-HCT [16, 17]. No specific disease-risk eligibility criteria were required except in one study which limited enrollment to patients with deletion of chromosome 13q [16].

Methodological quality

Methodological quality of included studies is summarized in Table 2. Briefly, all five studies utilized biologic randomization. Four studies reported data on prognostic factors and groups were balanced for presence of associated prognostic risk factors [1315, 17, 18] while one study did not report data on prognostic factors [16]. None of the studies reported whether all consecutive patients were enrolled. Four studies had at least 1:2 ratio of auto-allo HCT to auto-auto HCT patients while one study [17] had a 1:3.4 ratio. None of the five studies reported blinding of any study personnel. Four studies [1315, 17, 18] reported using the same reference time for assessing time dependent outcomes while one study [16] did not report a reference time. Three studies [1315, 18] reported outcomes according to intention-to-treat (ITT) and three studies [14, 15, 17, 18] reported harms for patients treated per protocol. One study reported a priori expected difference, pre-specified α and β error, and sample size calculation [13].

Table 2 Methodological quality of biologically randomized studies in tandem autologous versus autologous-allogeneic hematopoietic cell transplantation for patients with multiple myeloma

Benefits

Summary of all evidence is presented in Table 3.

Table 3 Summary of evidence for tandem autologous versus autologous-allogeneic hematopoietic cell transplantation in patients with multiple myeloma

Response rates

Response data was reported per protocol in four studies and one study reported all outcomes according to both ITT and per protocol [14]. Two studies [14, 17] used European Bone Marrow Transplantation (EBMT) criteria [21] for response assessment; one study [13] used International Uniform Response (IUR) Criteria [22], while the (more stringent CR and PR) criteria used by Bruno et al. was described [15]. One study did not report how response was assessed [16].

As illustrated in Figure 2 A-C, the pooled results (three studies [14, 16, 18] with 498 patients) showed no significant difference in overall response rate (ORR) between auto-allo HCT versus auto-auto HCT [risk ratio (RR) (95% confidence interval [CI]) = 0.98 (0.92-1.05), p = 0.66]. There was low heterogeneity between pooled studies for the outcome of ORR (I2 =25%). The pooled results for CR from five studies [13, 14, 1618] (1130 patients) showed a statistically significant benefit in treatment with auto-allo HCT over auto-auto HCT [RR (95% CI) = 1.65 (1.25-2.19), p ≤ 0.001]. However, there was statistically significant heterogeneity among pooled studies (I2 =68%). Results for at least VGPR (one study [13] enrolling 522 patients) showed no significant difference between either treatment strategy [RR (95% CI) = 0.97 (0.87-1.09), p = 0.66].

Figure 2
figure 2

A through 2C: Forest plot for response rates. Overall (A), complete (B) or at least very good partial response (C). The summary estimate (risk ratio) from individual studies is indicated by rectangles with lines representing the 95% confidence intervals (CIs). The summary pooled estimate from all studies is represented by the diamond and the stretch of the diamond indicates the corresponding 95% CI.

Event-free survival

None of the studies reported definitions for event-free survival (EFS) or progression-free survival (PFS). For this analysis, EFS data was used when reported, otherwise PFS was substituted. As presented in Figure 3A, the pooled results from three studies [13, 14, 18] (1229 patients) which reported EFS according to ITT showed no significant difference between treatment with auto-allo HCT versus auto-auto HCT [hazard ratio (HR) (95% CI) = 0.83 (0.60-1.15), p = 0.26]. Pooled results for three studies [14, 17, 18] (409 patients) which reported EFS per protocol also showed no significant difference in treatment with auto-allo HCT [HR (95% CI) = 0.78 (0.58-1.05), p = 0.11] compared with auto-auto HCT. Heterogeneity among studies included in ITT analysis was significant (I2 =77%) while heterogeneity in per-protocol analysis was moderate (I2 =32%).

Figure 3
figure 3

A and 3B: Forest plot for event-free survival according to intent-to-treat analysis (A) and overall survival (B). The summary estimate (hazard ratio) from individual studies is indicated by rectangles with lines representing the 95% confidence intervals (CIs). The summary pooled estimate from all studies is represented by the diamond and the stretch of the diamond indicates the corresponding 95% CI.

Overall survival

As illustrated in Figure 3B, the pooled results (three studies [13, 14, 18] enrolling 1229 patients) for OS according to ITT showed no significant difference in treatment with auto-allo HCT versus auto-auto HCT [HR (95% CI) = 0.80 (0.48-1.32), p = 0.39]. The pooled results from two studies [17, 18] (214 patients) which reported OS per-protocol also showed no significant difference between the two treatment modalities [HR (95% CI) = 0.88 (0.33-2.35), p = 0.79]. There was a statistically significant heterogeneity whether OS was analyzed according to ITT (I2 =85%) or per-protocol (I2 =77%).

Harms

Non-relapse mortality

Pooled results from four studies [13, 14, 17, 18] (1047 patients) showed NRM was significantly worse with an auto-allo HCT approach [RR (95% CI) = 3.55 (2.17-5.80), p < 0.00001] compared to auto-auto HCT (Figure 4A). There was no heterogeneity among included studies (I2 =0%).

Figure 4
figure 4

A through 4C: Forest plot for non-relapse mortality (A), grade II-IV graft versus-host disease (B) and chronic graft versus-host disease (C). The summary estimate (risk ratio/proportions) from individual studies is indicated by rectangles with lines representing the 95% confidence intervals (CIs). The summary pooled estimate from all studies is represented by the diamond and the stretch of the diamond indicates the corresponding 95% CI. For the proportional meta-analysis the diamond represents the pooled summary estimates and the 95% CI is indicated by the line.

Graft-versus-host disease

Incidence of any acute graft-versus-host disease (GVHD) was reported in one study [14] (91 patients) and the proportion of patients undergoing auto-allo HCT with any GVHD was 30.77% (95% CI 21.51-41.32). Incidence of grade II-IV GVHD was reported in four studies [13, 14, 17, 18] (363 patients), and the pooled proportion of patients undergoing auto-allo HCT with grade II-IV GVHD was 28.26% (95% CI 20.65-36.55; see Figure 4B). Heterogeneity among studies reporting grade II-IV GVHD was borderline (I2 = 59%). Incidence of chronic GVHD was reported in four studies [13, 14, 17, 18] (356 patients), and the pooled proportion of patients undergoing auto-allo HCT with chronic GVHD was 60.69% (95% CI 50.65-70.29; Figure 4C). Heterogeneity among studies reporting chronic GVHD was significant (I2 = 67%).

Sensitivity analysis/subgroup analysis

To assess robustness of the pooled results and explore possible reasons for heterogeneity, additional sensitivity and subgroup analyses were performed (see Table 4). To evaluate robustness of response outcomes, sensitivity analysis was performed according to response criteria (EBMT [21], IUR [22], non-EBMT/IUR [15], and not reported). There was no significant difference in ORR or CR regardless of criteria used. Sensitivity analysis for primary outcome of OS was performed according to all elements of risk of bias. Significant differences in pooled results were only detected when per protocol analysis of OS in a study (104 patients) which included at least 1:2 ratio of auto-allo HCT versus auto-auto HCT [HR (95% CI) = 0.55 (0.32-0.94) p = 0.03] was compared with per protocol analysis of OS in a study (110) which did not include at least 1:2 ratio of auto-allo HCT versus auto-auto HCT [HR (95% CI) = 1.51 (0.70-3.27) p = 0.30]. Sensitivity analysis according to risk of bias did not explain reasons for observed heterogeneity of primary outcome. For risk of random error, while one study [13] (710 patients) which reported sample size calculations showed no difference in OS [HR (95% CI) = 1.24 (0.94-1.64), p = 0.13], the pooled results from two studies [14, 18] (519 patients) which did not report sample size calculations showed a significant OS benefit with use of auto-allo HCT versus auto-auto HCT [HR (95% CI) = 0.64 (0.43-0.95), p = 0.03]. There was statistically significant heterogeneity between the two studies which did not report sample size calculations (I2 = 58%).

Table 4 Sensitivity analyses by response criteria and significant elements of quality

Discussion

Auto-HCT has been regarded as the standard of care for younger myeloma patients [1, 23]. However, much controversy exists about the role and timing of allo-HCT in newly diagnosed MM. Our meta-analysis indicates despite higher CR rates following an auto-allo HCT approach, there is no apparent improvement in OS, whether comparative analysis is performed as per-protocol or on ITT basis. This is likely explained by significantly higher NRM associated with RIC allo-HCT versus a second auto-HCT [RR (95% CI) = 3.55 (2.17-5.80), p < 0.00001]. Accordingly, further improvements in the auto-allo HCT approach will require strategies to significantly reduce NRM and augment anti-myeloma effects. Not surprising, significant cause of NRM in the auto-allo HCT arm resulted from development of acute and/or chronic GVHD in these patients. For instance, in the study by Krishnan et al. eight (13%) of 60 deaths were attributed to GVHD [13]. Similarly, in the study by Rosiñol et al., three (75%) of four cases of NRM were from complications of acute GVHD [17]. This suggests that future treatment strategies aimed at exploiting GVM effects, in auto-allo HCT approach, should avoid exacerbating GVHD at all costs. It is noteworthy that OS benefit with an auto-allo HCT approach is limited to studies using 2 Gy TBI-based conditioning regimens [14, 15], which has led to speculation [14] that the lack of survival benefit in other studies might relate to use of more intense conditioning which is associated with increased regimen-related toxicity and mortality in those studies [16, 17]. It is important to indicate the largest trial by Krishnan et al. [13] used 2 Gy TBI conditioning but was also subject to referral bias, and to date has not reported any survival benefit.

Conceptually, auto-allo HCT approach combines the advantage of cytoreduction from HDT from the first autograft with the benefit of adoptive immunotherapy resulting from the donor T cell alloreactivity. Notwithstanding, in the study by Krishnan et al. 22 (37%) of 60 deaths in the auto-allo HCT arm were still due to MM [13]. As a result, future strategies should aim at achieving deeper remissions, namely molecular remissions, or a state of minimal residual disease, prior to moving forward with allografting. This might entail evaluating novel potent therapies during the peri-allografting phase. Moreover, designing more effective regimens for allo-HCT, beyond 2 Gy TBI, is likely necessary to improve outcomes.

In regards to using auto-auto HCT as the control arm for comparison in these studies, one could argue that this approach is not yet considered the standard of care in all patients with newly diagnosed MM. In fact, outcomes from various studies comparing single auto-HCT versus tandem auto-auto approach have been discrepant [7, 24, 25] and a published meta-analysis failed to show OS benefit with tandem autografts [8].

A major limitation of all studies comparing auto-auto HCT to auto-allo HCT is lack of detailed information about disease/genetic risk stratification. Only one study limited accrual to patients with deletion 13q detectable by FISH [16]. However prognostic significance of 13q deletion detected by FISH as opposed to conventional cytogenetics remains questionable [26]. Whether an auto-allo HCT approach might be beneficial for high-risk MM is not known, and should be further assessed in future trials [2729]. We were not able to assess if auto-allo HCT approach might be beneficial for high risk myeloma patients as included studies did not report results according to risk categories for all outcomes. An individual patient data meta-analysis would be suitable to answer this question. Furthermore, the results are prone to outcome reporting bias as only three studies reported OS data according to ITT [13, 14, 18] and another study reported data using per-protocol analysis only [17].

The findings are also somewhat different from the systematic review by Armeson et al. as we excluded a manuscript published by Garban et al. because it aimed at comparing two parallel trials (IFM99-03 and the IFM99-04) which enrolled allograft and autograft recipients separately [12]. The objectives of the IFM99-03 trial were to evaluate the feasibility and NRM of RIC allografting [19], whereas the primary end point of IFM99-04 was to compare CR rates achieved after the second auto HCT (with or without anti–IL-6 monoclonal antibody BE-8). Additionally, we excluded a cohort of high-risk patients reported by a study by Krishnan et al. because the original aim of this study was to assess progression-free survival among standard-risk patients [13]. The investigators reported only partial data on a smaller cohort of high-risk patients.

Conclusions

Efforts at identifying particular subgroups of patients with MM, based on prognostic clinical, biological, cytogenetic and genetic risk factors, which are likely to benefit from an auto-allo HCT approach is necessary to help refine the role of this approach in MM. At the present, totality of evidence suggests that an auto-allo HCT approach for patients with newly diagnosed myeloma should not be offered outside the setting of a clinical trial.

Methods

Identification of eligible studies

Any completed study in newly diagnosed MM patients comparing auto-auto HCT versus auto-allo HCT was eligible for inclusion in this systematic review. Studies which did not utilize biologic randomization or were indirect comparisons of tandem auto-auto HCT versus auto-allo HCT were excluded.

A systematic search of MEDLINE database thru Nov 5, 2011, and pertinent conference proceedings (American Society of Hematology, American Society of Clinical Oncology, European Hematology Association, American Society for Blood and Marrow Transplantation, and EBMT Group) was conducted to identify relevant publications. The following search strategy was used: (″Multiple Myeloma″ [Mesh] AND ″Transplantation, Autologous″ [Mesh] AND ″Transplantation, Homologous″ [Mesh]). No search limits were applied based on language.

Study selection and data extraction

Two authors (M.A.K-D and M.H.) appraised the list of references and selected studies in consultation with other authors (T.R. and A.K.). Disagreements were resolved by consensus. Dual data extraction on clinical outcomes, treatment benefits and harms, and methodological quality of included studies was undertaken. Since biologic randomization is not similar to traditional randomized controlled trials, not all elements of risk of bias were applicable. For methodological quality, we extracted data on the following elements: comparability of two groups on all aspects except the intervention (e.g. disease stage, age, gender, etc.), enrollment of consecutive patients, enrollment of patients in auto-allo and auto-auto group in at least 1:2 ratio, description of withdrawals and dropouts (if any), blinding of study personnel and who was blinded (e.g. data collectors, outcome assessors etc.), comparability of reference time used for time-dependent outcomes between treatment groups and analysis according to ITT principle for benefits and per-protocol for adverse events. Clinical outcomes analyzed included: response rates (ORR, CR and VGPR), OS, EFS, NRM and GVHD. For purposes of this review, OS was considered the primary outcome; response rates, EFS, NRM and GVHD were considered secondary outcomes.

Statistical analysis

Dichotomous data were summarized using RR based on number of events and total number of patients and pooled under random-effects model. For time-to event data, HR and 95% CI were extracted when reported. When authors did not report time-to-event estimates, we extracted data from publication using methods described by Tierney et al. [30]. Time-to-event data were pooled using generic inverse variance under random-effects model. For analysis of proportional data, methods by Stuart et al. [31] were used to transform proportions into a quantity according to Freeman-Tukey variant of the arcsine square root transformed proportion [31]. Pooled proportion was calculated as a back-transform of the weighted mean of the transformed proportions, using random-effects model [31]. All data are reported with 95% CI. Calculation of the I2 statistic was used to test for heterogeneity. An I2 > 50% was considered statistically significant heterogeneity [32]. To assess robustness of the pooled results and explore possible reasons for heterogeneity, additional sensitivity analyses/subgroup analyses were performed according to publication type, patient and disease characteristics, and methodological quality of included studies (risk of bias and random error). All analysis were performed using RevMan 5.1 [33] and StatsDirect [34] software. This work is reported according to the PRISMA guidelines [35].

Previous presentation

Parts of this manuscript have been presented as an oral presentation at the Annual Meeting of the European Group for Blood and Marrow Transplantation 2012 (Abstract 520).