Introduction

Lymphomas are the second most common indication for autologous hematopoietic stem cell (HSC) transplantation (HSCT). The most common method for mobilizing HSCs from the peripheral blood is treatment with granulocyte-colony stimulating factor (G-CSF) alone or combined with chemotherapy [1, 2]. However, a sizable minority of patients fail to mobilize sufficiently with G-CSF-based regimens [3, 4].

Plerixafor has a mode of action different from other HSC mobilizing agents and acts by binding to chemokine (C-X-C motif) receptor 4 (CXCR4), preventing the binding of its ligand stromal cell-derived factor 1 (SDF-1, now C-X-C motif chemokine 12, CXCL12) and thereby inhibiting events downstream of CXCL12 including SDF-1-mediated G-protein activation, receptor internalization, calcium flux, and chemotaxis [5, 6]. The CXCL12/CXCR4 interaction is an integral part of the mechanism of homing and retention of HSC in the bone marrow and inhibition of this interaction by plerixafor mobilizes HSCs from the bone marrow [7, 8]. Unlike cytokines used for HSC mobilization (e.g., G-CSF), plerixafor is not a growth factor and does not cause cell proliferation or expansion. Therefore, the approved use for plerixafor is in combination with G-CSF [9, 10].

As there is a theoretical risk of tumor cell mobilization with any stem cell mobilization method for HSCT, the European Medicines Agency requested a postapproval analysis of plerixafor to evaluate the possible long-term negative impact related to tumor cell mobilization.

The European Society for Blood and Marrow Transplantation (EBMT) database is run and accessed through a project manager internet server and is accessible to all EBMT registered centers. Our analysis of the EBMT registry data for long-term clinical outcomes included progression-free survival (PFS), overall survival (OS), and cumulative incidence of relapse (CIR) in patients with lymphoma. Patients who received plerixafor for mobilization of HSCs were matched by propensity scoring with patients who received other standard mobilization methods. Only patients who had a subsequent HSCT were included in the study.

Methods

Study design

This was an international, multicenter, non-interventional registry study, with patient follow-up ranging from 3.5 to 7.5 years, which evaluated the outcomes for patients with lymphoma who received G-CSF + plerixafor compared with other mobilization methods for HSC mobilization and who received an autologous HSCT (ClinicalTrials.gov number NCT01362972). All patients ≥18 years from the EBMT registry, with a diagnosis of lymphoma who were considered poor mobilizers and had undergone their first autologous HSCT between 2008 and 2012, were considered eligible for the study. Patients in the plerixafor groups who were poor mobilizers were those treated according to the label. Patients enrolled in the registry and evaluated in this study were from Austria, Belgium, Bulgaria, Czech Republic, Finland, France, Germany, Greece, Hungary, Ireland, Israel, Italy, Netherlands, Poland, Romania, Spain, Sweden, Switzerland, and the United Kingdom.

The study was conducted in accordance with the Declaration of Helsinki and the International Conference on Harmonization Guidelines for Good Clinical Practice. Approval of the protocol was obtained from all participating sites, local governmental authorities, and Institutional Review Boards. This was a non-inferiority study, with a non-inferiority margin of a 30% increase in PFS and OS corresponding to a hazard ratio (HR) upper bound of 1.3; no lower limit was set.

Poor mobilizers

Predicted poor mobilizers were defined as patients who had received prior irradiation to marrow bearing areas or had high exposure to marrow-damaging chemotherapy. Proven poor mobilizers were defined as patients who in a previous mobilization attempt failed to mobilize sufficient CD34+ cells in peripheral blood to proceed to apheresis or to proceed to transplantation, or who, in the current mobilization, failed to achieve a sufficient increase in peripheral blood CD34+ cells at the predicted time for peak mobilization [11]. In this study, only poor mobilizers (either predicted or proven) have been considered in the propensity score matching and analyses.

Data collection

The data were entered, managed, and maintained centrally in an internet accessible database. Variables present on the EBMT Minimum Essential Data A and B forms were used to derive the data for the study and Medical C form was used to obtain plerixafor data.

Outcomes

The primary efficacy outcomes were OS, PFS, and CIR. Secondary efficacy outcomes were hematological recovery (time to achieve absolute neutrophil counts of ≥0.5 × 109/L and platelet counts of ≥50 × 109/L). All transplant complications occurring within 100 days of transplantation were recorded.

The following mobilization regimens were assessed:

  • G-CSF + plerixafor versus G-CSF alone;

  • G-CSF + plerixafor versus G-CSF + chemotherapy;

  • G-CSF + plerixafor + chemotherapy versus G-CSF + chemotherapy.

Graft failure was defined as: (1) the loss of the graft with neutrophils reaching ≥ 0.5 × 109 cells/L but subsequently decreasing to a lower level of cells until additional treatment for engraftment was provided, or (2) no engraftment, where neutrophils never reached ≥0.5 × 109 cells/L.

Statistical analyses

Due to the observational nature of the study, no formal statistical hypothesis testing was planned with adequate power or Type 1 error control.

Propensity score method was used to identify study comparison groups that were balanced with respect to baseline characteristics, including, demographics, lymphoma type, disease characteristics and staging, prior treatment characteristics, and disease status [12]. The baseline variables and patient demographics used for propensity score matching are shown in Table 1. Only patients who were identified as a “proven or predicted poor mobilizer” were included in the analysis.

Table 1 Patient demographics in the matched comparison groups used for propensity score matching

A single imputation approach was implemented to create complete data sets for analyses. Propensity scores were then fit using logistic regression models. Matches for plerixafor patients were identified from the non-plerixafor groups based on the estimated propensity scores. Matching was performed without replacement. Model success was based on whether balance between the plerixafor and the control groups matched samples was achieved.

Following the propensity score analysis, the outcomes for each mobilization treatment group were analyzed for comparable groups. Cox proportional hazards model with covariates was used for OS and PFS. The 95% confidence intervals (CI) and HR for the effect of treatment were calculated. Potential covariates included: interval from diagnosis to transplantation, disease status at conditioning and conditioning regimen, and disease status at time of transplantation.

Survival curves were developed for each treatment group using nonparametric Kaplan–Meier estimates [13], as well as survival rates at 6 months, and at 1, 2, 3, 4, and 5 years.

A competing risk model was developed for CIR; death without prior progression/relapse was treated as a competing event. The 95% CI and cumulative incidence at each year post transplantation were estimated.

Sample size was estimated using the following assumptions: 15% of transplanted lymphoma patients would receive G-CSF alone and 85% would receive G-CSF + chemotherapy; 10% of patients transplanted from each regimen would receive plerixafor treatment; and 70% of patients receiving plerixafor would be matched at a ratio of 1:2 plerixafor to comparator.

Results

Participants and demographics

Overall, 3764 patients were screened and 3749 were eligible to be included in the study (Fig. 1). These included 140 patients treated with G-CSF + plerixafor, 173 patients treated with G-CSF + chemotherapy + plerixafor, 549 patients treated with G-CSF alone, and 2887 patients treated with G-CSF + chemotherapy. The propensity score matching of predicted and proven poor mobilizers identified matched groups for the comparative analysis (Table 1). The number of patients classified as predicted or proven poor mobilizers was 136 treated with G-CSF + plerixafor, 173 treated with G-CSF + chemotherapy + plerixafor, 54 treated with G-CSF alone, and 245 treated with G-CSF + chemotherapy.

Fig. 1
figure 1

Patient eligibility and treatment. Key: G G-CSF, P plerixafor, C chemotherapy

After propensity scoring, 70 patients in the G-CSF + plerixafor were matched with 36 patients in the G-CSF alone cohort, 124 matched with 124 in the G-CSF + plerixafor versus G-CSF + chemotherapy cohort, and 130 matched with 130 in the G-CSF + plerixafor + chemotherapy versus G-CSF + chemotherapy cohort. Disease history and baseline demographics of the patients are shown in Table 1.

The proportion of patients who were proven poor mobilizers was greater in the plerixafor cohorts (ranging from 97.1 to 98.4%) compared with the comparator cohorts (ranging from 68.3 to 75.0%). More patients in the plerixafor groups failed to mobilize sufficient CD34+ cells at the predicted peak mobilization time compared with patients in the comparator groups (Table 2).

Table 2 Mobilization characteristics for the matched comparator groups

Primary outcomes

Progression-free survival

The estimated PFS at 3 years are shown in Table 3 and were similar between comparison groups. The Kaplan–Meier estimates for PFS were also similar for all the comparisons, G-CSF + plerixafor versus G-CSF alone (Fig. 2a), G-CSF + plerixafor versus G-CSF + chemotherapy (Fig. 2b), and G-CSF + plerixafor + chemotherapy versus G-CSF + chemotherapy (Fig. 2c).

Table 3 Primary outcomes progression-free survival, overall survival, and cumulative incidence of relapse for each of the comparator groups
Fig. 2
figure 2

Progression-free survival for each of the comparison groups, a G-CSF + plerixafor versus G-CSF alone (comparison 1); b G-CSF + plerixafor versus G-CSF + chemotherapy (comparison 2); and c G-CSF + plerixafor + chemotherapy versus G-CSF + chemotherapy (comparison 3)

Overall survival

Due to censoring median follow-up and interquartile ranges were not calculable for some comparison groups. Kaplan–Meier estimates of OS for the G-CSF + plerixafor versus the G-CSF alone group (comparison 1, Fig. 3a) were similar. Similarly, Kaplan–Meier estimates of OS for the G-CSF + plerixafor group versus G-CSF + chemotherapy group (comparison 2, Fig. 3b) were also similar, although the G-CSF + plerixafor group trended lower after 0.5 years compared with the G-CSF + chemotherapy group. For the G-CSF + plerixafor + chemotherapy group compared with the G-CSF + chemotherapy group OS was also similar between groups (comparison 3, Fig. 3c). As the upper limit of the HR was >1.3, based on predetermined boundaries, non-inferiority of plerixafor was not demonstrated for any of the comparison groups. Estimated OS at 3 years are shown in Table 3.

Fig. 3
figure 3

Overall survival for each of the comparison groups, a G-CSF + plerixafor versus G-CSF alone (comparison 1); b G-CSF + plerixafor versus G-CSF + chemotherapy (comparison 2); and c G-CSF + plerixafor + chemotherapy versus G-CSF + chemotherapy (comparison 3)

Cumulative incidence of relapse rates

Cumulative incidence of relapse rates at 3 years are shown in Table 3. The CIR trended higher in the G-CSF alone group compared with the G-CSF + plerixafor group until year 5. However, the CIR was similar between the G-CSF + plerixafor versus the G-CSF + chemotherapy group, and was similar between the G-CSF + plerixafor + chemotherapy group versus the G-CSF + chemotherapy group (Fig. 4).

Fig. 4
figure 4

Cumulative incidence of relapse for each of the comparison groups, a G-CSF + plerixafor versus G-CSF alone (comparison 1); b G-CSF + plerixafor versus G-CSF + chemotherapy (comparison 2); and c G-CSF + plerixafor + chemotherapy versus G-CSF + chemotherapy (comparison 3)

Secondary outcomes

Post transplantation

Adverse events occurring in more than one patient in any treatment group up to 100 days post first transplantation are shown in Table 4. Infections and infestations were the most common standard organ class complication in all plerixafor and comparator groups.

Table 4 Adverse events occurring in more than one patient in any treatment group up to 100 days post first transplantation

Engraftment was reported in ≥88% of patients in each treatment group of the three comparisons. Similarly, engraftment of both platelets and neutrophils was reported in ≥82% of patients in each treatment group. The median number of days to reach a platelet count of ≥20 × 109/L was 14 days for each of the plerixafor groups and 12–13 days for the respective comparator groups. The median number of days to reach a neutrophil count of ≥0.5 × 109/L was 11–12 days for each group.

Discussion

This study was a postapproval measure to investigate clinical disease outcome as a surrogate of potential enhancement of tumor cell release when combining G-CSF + plerixafor. The study was not designed to evaluate the benefit of stem cell mobilization with plerixafor compared with other methods. In general, no major differences were observed in PFS, OS, or CIR between the plerixafor and the comparison groups.

In line with the plerixafor label all patients in the study had to be poor mobilizers, either as proven poor mobilizers, based on their mobilization history, or as predicted poor mobilizers through exposure to high-dose chemotherapy. However, despite propensity score matching, which tried to balance the groups, there were still substantially more proven poor mobilizers in the plerixafor cohorts, which may have led to an imbalance between comparison groups. Consistently the median pre-apheresis CD34+ cell counts in the plerixafor cohorts during the current mobilization at the predicted time of peak mobilization were lower compared with comparison cohorts. Therefore, it appears that the groups were not well balanced for disease and HSCT risk.

Others have shown that lymphoma patients’ poor mobilization of autologous stem cells, defined as the inability to obtain ≥1 × 106 CD34+ cells/kg body weight, resulted in substantially lower PFS and OS compared with good mobilizers [14, 15]. Therefore, in our study, the higher proportion of proven poor mobilizers in the plerixafor groups may, at least in part, explain the PFS, OS, and CIR results versus those in the comparison groups. Furthermore, there are two confounders, which might have skewed the analysis of the data in the current study. Firstly, poor mobilization could be an indicator of more severe or prognostically worse disease. Therefore, the post-plerixafor outcomes trending toward slightly worse OS, PFS, and CIR could be the result of more advanced lymphoma disease and not a trend toward inferiority of plerixafor-mobilized grafts. Secondly, data on the lymphoma biology were not available, which does not allow the confirmation that the plerixafor and comparator groups were balanced in relation to the stage of the lymphoma disease.

Engraftment was reported in ≥88% of patients in each of the three comparator cohorts and the median number of days to achieve the target levels of platelets and neutrophils was similar for all cohorts (12–13 days and 11–12 days, respectively), which was in line with findings from a previous study [16].

In support of our findings, results from a 5-year, long-term, phase 3, follow-up study (not restricted to poor mobilizers) suggested that the use of G-CSF + plerixafor did not have a negative outcome on PFS and OS in patients with lymphoma, with more than a half of patients with lymphoma remaining alive 5 years post transplantation [17].

Propensity scores are increasingly used in nonrandomized studies to assess marginal treatment effects and to reduce potential bias and imbalances of baseline covariates [12]. The assumption of propensity scores is that none of the unmeasured covariates affecting treatment choice confound the association between treatment and outcome [18]. Unobserved differences between groups cannot be adjusted by propensity scores; therefore, this is both a source of potential bias and a limitation of propensity scores [19]. Despite the modest sample size in the comparisons resulting from propensity scoring matching, in our study, an adequate balance for certain key disease characteristics collected in the database was maintained. However, we acknowledge that some key factors affecting mobilization outcome were unavailable in the database, such as, the number of rounds and the type of chemotherapy administered, and for those who received radiotherapy the extension of irradiated fields. These limitations in data collection may have had a major impact on the outcomes in our study.

Moreover, in line with the reimbursement criteria for the drug and the obligation on some clinicians to closely manage treatment costs [20], plerixafor may have been selectively given to patients who were the poorest mobilizers at highest risk, and this may have been a factor in the trend for slightly worse outcomes in the plerixafor-treated groups. Conversely, only patients with successful mobilization were included in the study, this may have inadvertently introduced selection bias in favor of the non-plerixafor cohorts. Furthermore, our study was not adequately powered, as there were small to moderate numbers of patients in each cohort after propensity score matching. Although every effort was made to remove potential biases in this study, biases may not have been removed completely due to the observational nature of the data.

Conclusions

Although non-inferiority of the plerixafor groups could not be formally shown, PFS, OS, and CIR were numerically similar between comparators. Results from this study should be interpreted with caution due to a number of limitations. These include an imbalance between treatment groups due to the greater proportion of proven poor mobilizers in the groups treated with plerixafor, the limited scope of the EBMT database, the lack of power, and the observational nature of the study. In particular, a higher proportion of patients treated with plerixafor failed to mobilize sufficient CD34+ cells at the predicted time. Therefore, these patients represent a group that is likely predisposed to worse outcomes. Infections and infestations were the most common standard organ class posttransplant complication for plerixafor and comparator cohorts. Without plerixafor treatment, it is likely that some patients may not have proceeded to transplantation. Altogether, this study does not provide evidence that plerixafor mobilization is associated with an increased risk of relapse or post-transplant toxicities in patients undergoing HSCT for lymphoma.