Introduction

Multiple sclerosis (MS) therapies have changed considerably over the last several decades, with the approval of oral disease-modifying therapies (DMTs) following the demonstration of efficacy and safety for the treatment of the relapsing forms of MS (RRMS) [1].

Prior to 2010, only DMTs administered by injection were available as an initial therapeutic option. Later, two oral drugs were approved in European countries as first-line DMTs: delayed release dimethyl fumarate (DMF), also known as gastro-resistant DMF, and teriflunomide (TRF) [2, 3]. Pivotal trials demonstrated the benefits of both DMF and TRF with clinical (i.e., number of clinical relapses and disability accrual) and magnetic resonance imaging (MRI) disease activity, with a generally good safety profile [4,5,6,7,8,9,10,11]. Real-world evidence (RWE) studies have shown that treatment with DMF and TRF controlled similar disease activity (assessed by no evidence of disease activity, NEDA-3, and time to the new first clinical relapse at a 12-month follow-up) and that the two DMTs showed comparable discontinuation rates at a 24-month follow-up [12, 13].

Established efficacy evidence for reducing relapsing activity and disability progression is available for all licensed first-line DMTs, but the need for real data of comparison between the established and the newer first-line DMTs in unselected patient populations is still needed to define treatment sequences and to gather real-world data on long-term outcomes [8,9,10,11, 14,15,16,17,18,19].

In the last several years, the Italian MS Register, the largest national clinical database with about 140 Italian MS centers, offered the opportunity to study real-world clinical outcomes in large cohorts of patients to represent daily clinical practice [1820].

The aim of the current study was to evaluate long-term outcomes of first-line DMTs in terms of time to first relapse, time to confirmed disability progression (CDP), and, additionally, time to discontinuation in RRMS-naïve patients by focusing on the direct comparison between injectable and oral first-line DMTs, namely interferons and glatiramer acetate compared to dimethyl fumarate and teriflunomide [18].

Methods

Population

A multicenter, observational, retrospectively acquired cohort study was utilized for the current study. Anonymized clinical data of patients with RRMS were extracted from the Italian MS Register from their first treatment prescription with injectable and oral DMTs (between January 1, 2010 and December 31, 2017) to their last follow-up with the same treatment [20].

Key eligibility criteria included (1) a diagnosis of RRMS according to the 2010 McDonald criteria [21]; (2) aged between 18 to 55 years at the time of first DMT prescription; (3) start of injectable or oral first-line DMTs between January 1, 2010, and December 31, 2017; (4) continuous exposure to the investigated DMTs for ≥ 6 months; and (5) patients with at least three visits (including baseline) with an Expanded Disability Status Scale evaluation (EDSS).

RRMS-naïve patients who matched the required criteria were divided into two groups for the analyses, the injectable group (IG) and oral group (OG). The IG included RRMS patients who were treated with either Copaxone (40 mg per ml/three times per week subcutaneously and at least 48 h apart) or IFNs (interferon β-1a and interferon β-1b, 30 μg/0.5 mL, once weekly, intramuscularly or interferonβ-1a, either 22 mcg or 44 mcg, three times per week subcutaneously) [22,23,24,25]. The OG included RRMS patients who were treated with either DMF (120 mg twice per day for the first 7 days, then 240 mg twice per day) or TRF (14 mg once per day) [2, 3].

Study Endpoints

The primary study outcome was the evaluation of time to first relapse and time to CDP. The time interval from baseline to the first event (for patients with an event) or to the last evaluation at follow-up (for patients without an event) was examined. Additionally, the time to discontinuation of the first prescribed DMT was evaluated.

Procedures and Outcomes

Patients were included in the study at the initiation of treatment (baseline) and were monitored over their full time on the medication with data collection performed at baseline and approximately every 6 months during the time of exposure. Patients were censored at treatment discontinuation or at their last recorded clinical visit.

A relapse was defined as new symptoms or an exacerbation of existing symptoms persisting for ≥ 24 h in the absence of concurrent illness/fever and occurring ≥ 30 days after a previous relapse. CDP events were defined as ≥ 6-month confirmed increases of either ≥ 0.5 points for patients with a baseline EDSS score > 5.5, ≥ 1.0 point for those with a baseline EDSS score of 1 and 5.5, and ≥ 1.5 points for those with a baseline EDSS score of 0. A minimum of three visits, including the baseline visit, with an EDSS score evaluation, was required. EDSS scores recorded within 30 days after the onset of a relapse were excluded.

Discontinuation of investigated drugs was defined as a gap of treatment for 60 or more days. Time to discontinuation (in months) was measured as the time between the index date and the end of the supply of the prescriptions dispensed.

Statistical Analyses

Data are presented as counts (proportions) for categorical variables and mean (standard deviation, SD) or median (interquartile range, IQR) for continuous variables. Unpaired t tests and Mann Whitney tests were used to compare continuous variables according to their distribution. Chi-square tests were used to compare categorical variables. Univariate non-parametric Kaplan Meier (K-M) curves and log-rank tests were used to evaluate the events under investigation in the entire sample.

A Schoenfeld’s global test was used to verify the proportional hazards assumption for the time-on-treatments. Once the proportionality assumption was verified, a Cox proportional model was built for each investigated outcome in the entire sample. A Cox proportional hazard univariate regression model was used to estimate the hazard ratio (HR) and its 95% confidence interval (CI).

To take into account the imbalance of the two groups, a propensity score (PS) was calculated.

Additionally, a logistic regression was conducted to evaluate all patients using treatment (injectable vs oral) as the independent variable and baseline levels of sex, age, type of MS onset (monofocal or multifocal), EDSS score, number of relapses in the year prior to onset, and disease duration as covariates. Inverse probability of treatment weight (IPTW) and the stabilized inverse probability of treatment weight (SIPTW) were also calculated.

Standardized differences calculated in weighted (using the stabilized weights) and unweighted samples were used to assess the balance of baseline covariates between treated and control.

Multivariable Cox proportional hazard regression models weighted for IPTW were performed to evaluate the relationship between outcomes and treatment groups. HRs and 95% CIs were calculated to evaluate the relationship between outcomes and the treatment group.

For the analysis of relapse outcomes, a negative binomial model and weighted negative binomial model were conducted, using the annual relapse rate as the dependent variable and group as the independent variable.

To better examine the differences between the two treatment strategies in mild-to-moderate patients, subgroup univariate analyses were conducted, stratifying patients on baseline EDSS scores (≤ 2 and > 2) and for number of relapses in the pre-baseline year (1 or more than 1), and a Cox proportional hazard univariate regression model was applied to each subgroup. HRs and their 95% CIs were reported. A sensitivity analysis was conducted on patients with at least 30 months of follow-up.

Furthermore, as a method of correcting the variables that were not measured, the E-value proposed by VanderWeele et al. was calculated [26]. The E-value was defined as the minimum strength of association on the hazard ratio scale that an unmeasured confounding variable would need to have with both the group and the outcome to fully explain the specific group-outcome association, conditional on the measured covariates.

Missing data were handled through multiple imputation. The analysis used normalized weights to approximate the inferences in the data with data missing not at random (MNAR) [27]. The associations between missingness of the baseline data and other demographical and clinical characteristics were calculated with a multivariate logistic regression analysis, as previously published [28, 29].

All results were considered significant at 0.05. Stata 16.1 was used for all analyses.

Protocol Approvals Standard, Registrations, and Patient Consents

Use of the Italian MS Register was approved by the Ethics Committee of the University of Bari (Italy) as the coordinator center (Reference numbers 0055587 and 0052885) and by the local Ethics Committees of all participant centers. The study protocol for the current analysis was also discussed and approved by the Scientific Committee of the Italian MS Register. Each subject enrolled with a diagnosis of MS was required to sign written informed consent to enter the Register. In some centers in which data had been collected before the Register was set up, depending on local laws and regulations, historical data collected retrospectively were also included without informed consent when the patient was not traceable due to death, transfer, or for other reasons. The current report does not contain any individual or identifying information.

Data Availability

Anonymized data will be shared by request from a qualified investigator for the sole purpose of replicating procedures and results presented in the report, provided that the data transfer is in agreement with EU legislation on the general data protection regulation.

Results

Participants

Out of a cohort of 37,012 patients selected from the Italian MS Register, 11,416 started their first DMT during the index window. Out of those subjects, 4602 (3919 in the IG and 683 in the OG) were considered eligible for the analyses and were subsequently enrolled (Fig. 1).

Fig. 1
figure 1

Patients’ selection flow chart. DMT, disease-modifying therapies; EDSS, Expanded Disability Status Scale; IG, injectable group; OG, oral group; RRMS, relapsing remitting multiple sclerosis

Baseline characteristics by group are shown in Table 1. Patients in the IG had a higher rate of women (67.3% vs 63.4%, p < 0.05) and a lower mean age (36.1 ± 10.9 vs 38.9 ± 11.8, p < 0.001, see Table 1).

Table 1 Baseline characteristics of the two groups

Median disease duration was longer in the IG (66 months with a range of 23–249 compared to 70 months with a range of 21–277, p < 0.0001).

The median follow-up of the total cohort was 36 months (IQR = 22–36 months), while the follow-up in the IG was a median of 36 months (IQR = 28–36 months) and the OG median was 19 months (IQR = 12–27 months), p < 0.0001.

Findings on the Pre-matched Samples

During the follow-up, 1477 patients relapsed (n = 1363 (35%) in the IG, n = 114 (17%) in the OG). A log-rank test demonstrated that the risk to experience the first relapse was lower in the OG (p < 0.001, Fig. 2), which was also confirmed by the Cox model (HR = 0.57; CI 95%: 0.47–0.69).

Fig. 2
figure 2

Time to first relapse between the two groups

A negative binomial model and weighted negative binomial model were applied using the annual relapse rate as the dependent variable and group (IG vs. OG) as the independent variable. The incidence rate ratio (OG vs. IG) was 0.63 (95% CI: 0.51–0.78, p < 0.001). After an inverse probability weighting, the incidence rate ratio was 0.65 (95% CI: 0.52–0.82, p < 0.001). Taking this into consideration with the total count of relapses, the OG demonstrated a lower risk than the IG.

The event CDP was observed in 641 patients (n = 574 (15%) in the IG, n = 67 (10%) in the OG). The risk to reach CDP did not differ between the two groups using a log-rank test (p = 0.370, Fig. 3). This was confirmed by the Cox model (HR = 1.12; 95% CI: 0.87–1.45, p = 0.370).

Fig. 3
figure 3

Time to CDP between the two groups. CDP, confirmed disability progression

Finally, DMT discontinuation was observed in 1456 patients (n = 1352 (34.5%) in the IG, n = 104 (15%) in the OG). The risk of DMT discontinuation was lower in the OG as per the log-rank test (p < 0.001), which was confirmed by the Cox model (HR = 0.71; 95% CI: 0.58–0.86, Fig. 4).

Fig. 4
figure 4

Time to first DMT discontinuation between the two groups

Findings After IPW PS Adjustment

Results from the Cox models following the IPW PS adjustment were consistent with the previous models (Fig. 5). The event time to first relapse demonstrated a lower risk in the OG (HR = 0.58; 95% CI: 0.48–0.72, p < 0.001). No differences were found for the risk of CDP between the two groups (HR = 0.94; 95% CI: 0.76–1.29, p = 0.941). However, a lower risk of DMT discontinuation was found in the OG (HR = 0.72; 95% CI: 0.58–0.88, p = 0.002, Fig. 5).

Fig. 5
figure 5

Analysis of treatment effects in time to first relapse, time to CDP, and time to DMT discontinuation. (Asterisk) The treatment effects were explored by a propensity score adjustment in quintiles for age, duration of disease from onset, Expanded Disability Status Scale at baseline, relapses in the previous year, sex, and clinical onset. CI, confidence interval; HR, hazard ratio; CDP, confirmed disability progression; DMT, disease-modifying therapies; IG, injectable group; OG, oral group

Subgroup Analyses

Subgroup analyses were also performed for each investigated outcome to examine any difference in the two groups in mild-to-moderate patients. IG and OG subjects were stratified based on their baseline EDSS score (≤ 2 or > 2) and the number of relapses during the year prior to their baseline visit (1 or > 1, Table 2).

Table 2 Subgroup univariate analysis of treatment effects in terms of risk of first relapse, CDP, and DMT discontinuation

The OG had lower risk of event time to first relapse than the IG, for both EDSS subgroups and the subgroup with one relapse in the year prior to baseline (Table 2). Furthermore, the risk of discontinuation was lower in the OG for subjects with an EDSS ≤ 2 and for both relapse subgroups (Table 2). There were no statistically significant differences found between the two groups for CDP.

Sensitivity Analysis

A sensitivity analysis was conducted on 3007 (139 OG and 2868 GI) subjects with at least 30 months (out of 36) of follow-up. The interquartile range around the median at follow-up visits became much smaller, with median follow-up of 36 months (IQR = 36–36 months). In the IG, the median was also 36 months (IQR = 36–36 months), while in the OG it was 35 months (IQR = 33–36 months).

Before the PS adjustment, hazard ratios were obtained for first relapse (HR = 0.48; 95% CI: 0.34–0.70, p < 0.001), for time to CDP (HR = 1.11; 95% CI: 0.74–1.68, p = 0.613), and for time to first DMT discontinuation (HR = 0.61; 95% CI: 0.43–0.87, p = 0.006). After IPW PS adjustment, hazard ratios were also obtained for time to first relapse (HR = 0.50; 95% CI: 0.35–0.73, p < 0.001), for time to CDP (HR = 1.12; 95% CI: 0.74–1.69, p = 0.604), and for time to first DMT discontinuation (HR = 0.62; 95% CI: 0.44–0.89, p = 0.009).

E-Values for Unmeasured Variables

The observed HRs could be explained away by an unmeasured confounding variable that was associated with in the observed group and the outcome by a E-value of 2.27 for time to relapse, 1.26 for CDP, and 1.82 for DMT discontinuation, but a weak confounding variable could not do this.

Discussion

In this multicenter, observational, retrospectively acquired cohort study, starting oral first-line DMTs (DMF and TRF) was associated with a lower risk of first relapse occurrence and treatment discontinuation rate during the follow-up, in comparison to first-line injectable DMTs, but no significant difference was found in reaching CDP.

In clinical practice, oral DMTs are gradually replacing the injectable DMTs after they were licensed for RRMS treatment because of their improved tolerability by the patients [30]. However, injectable DMTs have been thoroughly investigated in terms of efficacy and are still largely prescribed for their well-characterized safety profile. Moreover, they are still widely considered for patients who intend to become pregnant, having been approved for use during childbearing and breastfeeding (which was recently extended for IFNs) [31, 32].

First-line injectable and oral DMTs have not been compared in non-inferiority trials, nor have they been compared in registry based cohorts studies as first choice DMTs [33, 34].

Considering that big data registries offer the opportunity to study real-world clinical outcomes in large cohorts of patients, the strengths of the current work include the generalizability, the representation of daily clinical MS practice, and the large cohort of patients collected in the Italian MS Register, which is the largest national clinical database with about 140 Italian MS centers connected [20, 35].

The current data suggest that first-line oral DMTs should be a suitable first choice in RRMS patients when, according to a prognostic profile, a first-line DMT is required.

Patient and disease heterogeneity at the initial presentation and during the disease course makes the increasing treatment choices for RRMS valuable, thereby allowing for the personalization of treatment. First, long-term results of the drug safety and efficacy of a compound may inform decision-making. Here, real-world data and well-structured registries are of importance, as is the statistical method employed. PS adjustment allowed for the mitigation of the effect of heterogeneity of the data. However, all PS methods cannot eliminate bias due to unknown or unmeasured confounding variables. Since an observational retrospective study has biases related to data collection, in particular, in the two groups demonstrating a difference in the follow-up period, the survival analysis allows this to remain under control. Moreover, the sensitivity analysis conducted was limited to patients with at least 30 months of follow-up. The possible effect of unmeasured confounding variables was calculated using an E-value [36,37,38,39].

The primary limitations of our study pertain to the observational nature of the data. In addition, MRI activity was not evaluated, nor was there a correction for MRI parameters in the PS model. This may be a limitation, as the current criteria for defining the efficacy of a treatment of MS use composite scores, as no evidence of disease activity (NEDA3) must take into account the presence of new or active (enhancing) lesions on MRI scans [40]. This is, indeed, missing in many real-world studies. The level of evidence of a descriptive study without MRI data is limited and cannot replace a non-inferiority trial.

The current study was not designed to compare the safety of the two approaches because the safety data were not sufficiently complete to enable such an analysis. Further research is needed to more accurately identify patients who are most likely to benefit from these therapies.