Oral therapies for treatment of relapsing–remitting multiple sclerosis in Austria: a 2-year comparison using an inverse probability weighting method

Objectives To compare the efficacies, frequencies and reasons for treatment interruption of fingolimod (FTY), dimethyl fumarate (DMF) or teriflunomide (TERI) in a nationwide observational cohort. Materials and methods Two cohorts of patients with relapsing–remitting multiple sclerosis (RRMS) having started treatment with FTY, DMF or TERI documented in the Austrian MS Treatment Registry (AMSTR) since 2014 and either staying on therapy for at least 24 months (24 m cohort) or with at least one follow-up visit after start of treatment (total cohort). The 24 m cohort included 629 RRMS patients: 295 in the FTY, 227 in the DMF and 107 in the TERI group. We used multinomial propensity scores for inverse probability weighting in generalized linear and Cox proportional hazards models to correct for the bias of this non-randomised registry study. Results Estimated mean annualized relapse rates (ARR) over 24 months were 0.13 for FTY, 0.09 for DMF and 0.11 for TERI treatment. For TERI in comparison with DMF, we observed higher probability for treatment interruption (p = 0.023) and reduced sustained EDSS regression for 12 (p = 0.016) and 24 weeks (p = 0.031) and, for the comparison of DMF versus FTY, a reduced sustained EDSS progression for 12 weeks (p = 0.02). Conclusions Relapse rates with treatment with FTY, DMF and TERI were similar. Patients treated with DMF showed less sustained disability progression for 12 weeks than FTY-treated patients. However, FTY and DMF treatment was associated with more likely EDSS regression for 12 and 24 weeks and a lower probability for treatment interruption as compared to TERI-treated patients. Electronic supplementary material The online version of this article (10.1007/s00415-020-09811-6) contains supplementary material, which is available to authorized users.

The objective of our study was, first, to compare the efficacy of FTY, DMF or TERI and, second, to analyse the probability for stopping, pausing or switching (treatment interruption) of these therapies in a nationwide observational cohort using prospectively collected data from a real-life setting.

Data collection
The Austrian MS Treatment Registry (AMSTR) [20,21], established in 2006 to maintain quality control and comply with reimbursement regulations of the Austrian sick funds, allows to obtain clinical data, to assess indications, the clinical profiles of the treated patients and to monitor safety in real life. The AMSTR is part of the dense MS network in Austria, which is constituted by all MS clinics from neurological departments and some dedicated neurological doctoral offices. In addition, prescriptions of DMTs for MS are exclusively restricted to MS centers. Thus, prescriptions and treatment documentations are evenly distributed across Austria. The AMSTR is compliant with Austrian laws on bioethics and was approved by the ethical committee of the Medical University of Vienna (EC number 2096/2013).
AMSTR documents anonymous baseline data, including MS onset and duration, relapses in the prior 12 months, EDSS, gross MRI activity and previous disease-modifying therapies (DMT). Follow-up data (relapses, EDSS, adverse events [AE's], change or discontinuation of treatment) are required to be documented every 3-6 months, median visit interval 3.8 months for fingolimod, 4 months for DMF and 3.8 months for teriflunomide. Each relapse had to be confirmed by a neurologist at the MS center and documented in the AMSTR. Documentation required relapse onset, EDSS and use/dosage of i.v. methylprednisolone treatment. Besides the fact that applying the AMSTR is mandatory for reimbursement, a special quality-related feature of the AMSTR is an external and independent data monitoring to improve data management in terms of completeness and plausibility of documented data.
In 2011, the European Medicines Agency (EMA) approved FTY along the same indication criteria as natalizumab. Reimbursement for FTY in Austria adheres to this approval. Thus, FTY-treated patients in Austria had to have either at least one relapse in the prior 12 months despite treatment with interferon beta or glatiramer acetate and at least 9 T2 lesions or at least one Gadolinium enhancing lesion on recent brain MRI ("indication A"), or two or more severe relapses in the preceding treatmentnaïve 12 months and one or more Gadolinium enhancing lesions on brain MRI or a significant increase in T2 lesion load as compared to a previous recent MRI ("indication B").
In 2013, TERI and in 2014, DMF were approved by the EMA with the indication for the treatment of adult patients with RRMS.
We investigated a total cohort of 1530 patients, who started treatment with FTY, DMF or TERI in the AMSTR at any time since 2014. The coverage of the AMSTR for the three oral agents is approximately 70% of total prescription in Austria. For the purpose of this study, we analysed the data of these patients in two separate cohorts. The first cohort stayed on therapy for at least 24 months (24 m cohort), and this group was analysed for comparing the efficacies of the different oral drugs. The second cohort was the total cohort, defined by availability of at least one follow-up visit, also including the 24 m cohort. This group was analysed for the frequency, cause and risk of interruption (total cohort).
The primary outcome measure was the ARR during treatment with FTY, DMF or TERI over 2 years after initiation of therapy. Relapses were defined as new or worsening neurological symptoms lasting for at least 24 h in the absence of fever.
Further outcome measures were the total number of relapses, EDSS progression or regression confirmed after 12 and 24 weeks, and EDSS changes during the 2-year period (difference between EDSS at the last visit and at baseline). Sustained disability progression or regression was defined as an increase or decrease from baseline of at least 1.0 point in the EDSS score (or at least 0.5 points for patients with a baseline EDSS score greater than 5.5) that persisted for at least 12 or 24 weeks.
For analyses of the treatment interruption, we defined three causes, namely (a) stopping treatment as permanent treatment interruption in the AMSTR; (b) pausing treatment as treatment interruption and restarting with the same treatment; and (c) switching treatment as treatment interruption and starting with a new medication in the AMSTR.

Statistical methods
All effects estimated in comparing treatment groups were average treatment effects (ATE). To control the bias for nonrandomised assignment to the treatment groups, we used inverse probability weighting (IPW) and propensity score (PS) matching as a comparison method. When comparing three groups, we used the estimation of multinomial propensity scores as described by McCaffrey [22]. Propensity scores for treatment with FTY, DMF and TERI were estimated for all patients with the baseline parameters age, duration of disease, number of relapses 12 months prior to baseline, EDSS, presence of at least 9 MRI T2 lesions and at least one contrast-enhancing MRI lesion, and previous therapy as independent variables. These variables were included in the model because of their clinical meaning, independent from their significance as a predictor in the model. Therefore, we tried to overcome the problems of being misled by false positive predictors in a multiple testing situation as well as missing relevant variables by abandoning them in a beta failure decision. Treatment groups were balanced for all variables after scoring (Table S1). Our PS estimations for IPW were optimized for the Kolmogorov-Smirnov (KS) statistic, because this method compares the entire distribution rather than just the mean. A generalized linear model (GLM) with relapse count as Poisson-distributed dependent variable and log transformed observation time in years as offset variable was used to estimate the treatment effect on the ARR in the 24 months observation period. To overcome a potential immortal time bias, we secondary analysed ARR in an observation period without a time limit.
Augmented inverse probability weighting was used to analyse the change of EDSS from baseline to the last visit in the 24 months observation period, so the mean differences between last visit and baseline (negative as improvement, positive as worsening) could be estimated for each treatment from the potential means generated by the model.
We used Cox proportional hazards models for analysing EDSS progression and regression confirmed after 12 and 24 weeks, and the relapse hazard in the 24 months observation period.
Cox proportional hazards models were also used analysing treatment interruptions in the patient cohort with at least one follow-up visit.
All models included treatment as categorical factor and inverse multinomial propensity scores as weights regarding the survey character of the study. All variables used for propensity scoring were also used in the outcome models as independent variables to obtain adjusted treatment effects. We applied this double robust approach, because the ATE estimator remains consistent if at least one of the two, the propensity score model or the outcome regression, is specified properly. Thus, the misspecification of only one of the two models would not cause any harm to the ATE estimator [23].
For all Cox models, the proportional hazards assumption had been verified by non-significant deviations from the proportional hazards assumption using Schoenfeld residuals.

Results
The 24-month continuous treatment cohort included 629 RRMS patients: 295 in the FTY, 227 in the DMF, and 107 in the TERI group. The baseline data of the 629 patients are summarized in Table 1  Analysing ARR from the GLM in an observation period without a time limit results were similar, no significant differences were observed between treatments. Finally, PS matching produced differences between treatments similar in comparison with IPW, but also without statistical significance.
Estimated mean relapse counts from the GLM within the first 3 months were 0. 11 (Fig. 1).
The total cohort comprised 1530 RRMS patients (585 with FTY, 651 with DMF and 294 with TERI). Baseline data are summarized in Table 2 and show a certain imbalance for some baseline variables. For analysing hazard ratios for treatment interruption, again inverse probability weighting was used, resulting in a weighted sample size of 3998 patients (1327 in the FTY, 1423 in the DMF, and 1248 in the TERI group) ( Table S4).
The ARR for patients staying on treatment over the whole observation period (26.8 months, SD 16.7) was 0.18 (SD Fig. 2 a, b Cumulative probability for disability progression sustained for 12 (a) and 24 weeks (b) within the first 24 months of RRMS treatment with fingolimod, dimethyl fumarate or teriflunomide. c, d Cumulative probability for disability regression sustained for 12 (c) and 24 weeks (d) within the first 24 months RRMS treatment with fingolimod, dimethyl fumarate or teriflunomide. DMF dimethyl fumarate,

Discussion
In this observational study, we prospectively collected data to compare the efficacy of FTY, DMF and TERI in 629 patients who continuously received treatment for at least 24 months, and in a wider population of 1530 patients who had at least one follow-up visit subsequent to starting therapy. The different approved indications caused differences in the cohorts at baseline (Table 1). In particular, the TERI group was older and less likely to have had a relapse in the prior 12 months. Over 90% of the FTY patients had received prior treatment as compared to only 56% of the DMF and 63% of the TERI cohort. In contrast, DMF patients were younger and less disabled with shorter disease duration.
To account and control for these documented differences, we used inverse probability weighting (IPW) and as a comparison method propensity score matching (PS). To demonstrate balance or imbalance after matching we optimized our PS estimations for IPW for the Kolmogorov-Smirnov (KS) statistic (Tables S2 and S3). In comparison of both, IPW and PS, the differences regarding ARR were not significant.
On the basis of these results and as three treatments needed to be compared, we used the method of IPW instead of propensity score (PS) matching. One reason for that lies within the fact that PS matching would have generated three different two-group comparisons (FTY-DMF; TERI-DMF; FTY-TERI). This would have produced different subpopulations for each treatment group in its particular comparison to the other two treatment groups, depending on the PS overlap and the following matching result. Here we saw the risk of comparing patients with the lowest scores in the treatment group with patients showing the highest scores in the control group. Besides losing information of unmatched patients, we thus would have run risk to compare patients atypical for respective treatments with patients who might be considered atypical for the control treatment. Furthermore, IPW offers opportunity to use all patients of our populations avoiding the problem of missing data, also allowing for considering all three treatments at once with the chosen models.
As a further measure to reduce bias, we decided to use all variables of the PS model also in the outcome models, leading to further adjustment for the treatment effects.
Comparing our present results with prior published 12 months' data [20], we found a significantly higher EDSS impairment, lower EDSS regression and a higher interruption rate in the TERI group. The longer observation period on treatment (at least 24 months) produced more robust data especially in regards to disease progression and regression. patients Two previous studies also compared between these oral MS drugs [16,17]. Ontaneda et al., analysed patients from a commercial claims database, switching from platform disease-modifying therapies (DMTs) to DMF, FTY and TERI and staying on treatment for at least 3 months. Comparable post-index ARR were observed between DMF and FTY, but were significantly lower with DMF versus TERI [16]. In contrast, Kalincik et al. [17] showed a lower ARR on FTY compared with DMF and TERI analysing 614 (TERI), 782 (DMF) or 2332 (FTY) patients from the global MSBase cohort, staying at least 3 months on treatment. No differences in disability accumulation or improvement were found between these therapies.
In contrast to the aforementioned study, our whole study population had to be on treatment for at least 24 months, leading to an overall lower ARR rate and possibly resulting in more robust and comparable data. In addition, we used the method of IPW instead of propensity score matching.
The hazard ratio for treatment interruption comparing TERI versus DMF and FTY was significantly higher.
The main reason for interrupting FTY and DMF were adverse events and patients' wishes, but for TERI, clearly disease progression, resulted in a higher switching rate in the TERI cohort as compared to FTY-and DMF-treated patients.
These results are in contrast to Vollmer et al. [10], who found a lower discontinuation rate for FTY (34.3%) versus DMF (47.1%), driven by adverse events. Hersh et al. [12] also reported a higher likelihood of early discontinuation of DMF (41.3% versus 35.6%), mostly again due to adverse events.
Kalincik et al. [17] observed lower discontinuation rates (24% with DMF and TERI and 10% with FTY), and lack of efficacy was relatively more commonly reported in TERI and DMF patients in comparison with FTY.
Immortal time bias is a problem in studies comparing a treatment group with a minimum survival time as qualification condition to a control group without this limit. In our study, this qualification condition was given for all groups. In advance, we compared the interrupt frequency between the treatment groups and observed comparable interrupt rates in the first 24 months for the observed reasons switch, pause and stop. Also the time until these events were comparable. Differences in ARR were only observed in single highly active patients in the early phase of the disease. Analyzing ARR in an observation period without a time limit results were similar, no significant differences were observed. In this evaluation, FTY showed the lowest ARR, followed by DMF and TERI. The reason for the lower ARR in the FTY cohort was based on the fact that FTY patients had longer observation periods than the DMF and TERI groups resulting in fewer relapses in the later phase of the disease. Finally we tried to avoid immortal time bias analyzing EDSS progression/regression confirmed at 3 and 6 months, which would be induced allowing short observation periods.
In summary, we believe the minimum qualification time should not produce a relevant bias for the comparison between the three treatment groups.
The strengths of our study are that this work represents data from a nationwide observational study, comprising patients in Austria who have been treated with FTY, DMF and TERI since 2014. The AMSTR is a secure web-based platform, which enables treating neurologists in all Austrian MS centres to immediately perform online documentation during patient visits. To ensure high documentation and data quality in terms of completeness and plausibility, the AMSTR is monitored by an external and independent clinical research organization. This real world data shows a low ARR, progression rate and discontinuation rate for all three oral drugs reflecting high quality maintenance of MS patients in Austria.
As an important limitation of our study, MRI data were only available at baseline before starting treatment with FTY, DMF and TERI and were included as an independent variable for propensity scoring and in the respective outcome models.
In conclusion, we found no difference analysing ARR and probability for experiencing a relapse between the three oral treatment regimen, but there were significant differences regarding (1) EDSS impairment, higher rates of treatment interruption and reduced sustained EDSS regression for 12 and 24 weeks comparing TERI with DMF and (2) reduced sustained EDSS progression for 12 weeks concerning DMF versus FTY.