Background

Multiple sclerosis (MS) is a chronic, degenerative, and often progressive disease of the central nervous system that can affect multiple parts of the body and result in various symptoms including mobility restrictions, fatigue, pain, depression, and changes in vision and cognition [1]. These symptoms typically lead to a deterioration of the quality of life of persons with MS (PwMS) [2].

While there is no cure yet, key treatment for PwMS includes disease-modifying therapies (DMTs), which target the inflammatory response of the immune system resulting in an almost complete suppression of the disease activity (i.e., suppressing relapses and the occurrence of new lesions in the central nervous system), which in turn would prevent disease progression and may reduce or even avoid the development of disability [3, 4]. With this mechanistic approach in mind and due to the multilayered aspects of the disease activity, numerous clinical trials have evaluated the effects on disability worsening using the Expanded Disability Status Scale (EDSS), relapses, and/or magnetic resonance imaging (e.g., new lesions or brain atrophy) as primary end points [5]. However, patients, clinicians, guideline developers, medical associations, payers, and regulators have increasingly called for more evidence on the impact of innovative medicines for MS on patient-reported outcomes (PROs) [6,7,8].

PROs are any assessment directly appraised and reported by patients of their health status, such as pain, fatigue, and in particular quality of life (QoL). QoL provides the patient’s unique perspective on both beneficial and harmful treatment effects including treatment burden, symptom alleviation, side effects of treatment and control of disease activity, and increase patients’ involvement in shared decision-making by being key actors in assessing their health. A survey conducted in over 2,000 persons with MS has identified QoL measures, MS symptoms, and preservation of cognition as priority criteria when selecting a DMT [9]. Similarly, a recent initiative developed a patient-centered standard outcome set for MS identifying four domains of interest: disease activity, symptoms, functional status, and QoL [10].

The complexity of the measurement with often multiple domains or subscores and the heterogeneity of instruments being used [5, 11] render comparisons across DMTs difficult and informing treatment decisions hazardous. We aimed to explore which effect estimates on QoL assessed with which instruments are typically obtained in evaluations of DMTs, as a systematic benchmark for an evidence-based research agenda focused on patient-centered end points in MS.

Methods

We conducted a systematic analysis of clinical trials that assessed the effects of DMTs on QoL in PwMS. To structure our review report, we followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [12]. This study was not prospectively registered; and we did not critically appraise the included studies.

Eligibility criteria

We included randomized controlled trials (RCTs) comparing the effects of approved DMTs with any comparator (e.g., placebo, standard of care, active comparator) on at least one outcome referred to as “quality of life” that reported these effects in a journal article in English language or that were registered in a clinical trial registry with QoL as a prespecified outcome and the trial was categorized as “completed” and/or “has results”. There was no restriction to MS type, disease severity, setting, or year of publication. We considered approved DMTs identified in the European public assessment reports by the European Medicines Agency (EMA) [13] and the Drugs@FDA database by the Food and Drug Administration (FDA) [14] (as of September 29, 2022). We excluded trial protocols and trials labeled as ‘extension’.

Information sources and search strategy

We searched PubMed (last search: October 4, 2022; Supplementary file 1) using (i) a disease-specific search component (development informed by a recent Cochrane review [15] and several reviews on approved DMTs in MS [16,17,18]) combined with (ii) identified drug names and corresponding brand names of approved DMTs. Each DMT was transposed into the search strategy by combining a free text word (all fields search and automatic term mapping) with a related Medical Subject Heading (MeSH), if available, e.g., ‘Interferon OR Interferons[MeSH]’. To specifically identify RCTs, we limited the search hits using the PubMed-integrated filter ‘Randomized Controlled Trial’.

We searched ClinicalTrials.gov (last search: April 6, 2023; Supplementary file 1) using the search fields condition (‘Multiple sclerosis’), outcome measure (‘Quality of life’), and intervention (list of identified DMTs as used for the free text PubMed search); limited to trials (‘Interventional’), described as “completed” or “has results”. Registry entries with results were considered for study selection and data extraction. Registry entries of completed trials without posted results were followed to identify journal articles by searching PubMed (all fields search) and Google Scholar using the trial registry number (last search: April 13, 2023).

Study selection

All full texts of the retrieved journal articles were collected and considered as potentially eligible since the list of outcome measures might be rarely reported in the title and/or abstract. Full texts were screened by one reviewer for eligibility assessment (out of JH, PJ, KD, and TVN) with confirmation of a second reviewer in any unclear case (out of JH, PJ, and LGH).

Data extraction

One reviewer (out of JH, KD, or PJ) extracted information on trial sample (type of MS, number of randomized participants), intervention and comparator, outcome (number, name, type, hierarchy, and assessor of QoL measures and subscales used, longest follow-up length available, outcome prespecification), design characteristics (blinding and number of trial arms), QoL results (within and between randomized group comparisons; based on registry entries, if available), and bibliographic information (i.e., metadata such as authors, publication year, and journal). Data extraction was based on publication(s) and/or a corresponding registry entry using an electronic spreadsheet.

Data analysis

We summarized the trial characteristics using descriptive statistics. We considered all individual subscales of QoL instruments as QoL measures. QoL instruments were categorized as generic or disease-specific measures according to the description of the QoL instrument. Symptom-specific QoL instruments were categorized as disease-specific measures.

We considered comparative effects (i.e., between randomized groups) as QoL results. Data on QoL not reflecting effects of treatments (e.g., before–after changes within study arms) were not considered. We stratified QoL results by type of outcome measure (i.e., disease-specific or general QoL measure). We categorized QoL results into (1) quantitative QoL effects (between-group differences with dispersion were reported, e.g., mean differences with standard deviation or data on change from baseline to follow-up were reported that allowed us to calculate between-group differences); (2) QoL effects with p value alone (not providing data to derive quantitative effects); (3) narrative statements on between-group differences. To assess the statistical significance of the QoL results, we used reported p values (with p < 0.05 indicating statistical significance), reported or self-calculated confidence intervals (CI; with confidence interval of a mean difference not crossing the null indicating statistical significance), or statements by authors declaring the results as “statistically significant”. To allow the comparison of effects sizes across trials, whenever possible, we converted quantitative QoL effects to Cohen’s d and Hedges’ g. We also report the proportion of effect size superior to the 0.2 (Hedges’ g) minimal clinically important differences (MCID) as defined by current guidelines for health-technology assessment and reimbursement decisions on quality of life assessments [19]. We used R (version 4.2.2) for data analysis.

Results

We identified 38 eligible trials that reported QoL results in 40 publications between 1999 and 2023 (median publication year: 2015; Table 1; Supplementary file 2), with 34 trials (89%) published in journal articles [20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] and 4 trials (11%) registered on ClinicalTrials.gov [51,52,53,54].

Table 1 Summary characteristics of trials reporting on QoL effects of DMTs (n = 38)

Trial characteristics

The 38 trials had a median of 531 participants (interquartile range (IQR) 202 to 941; total 23,225 PwMS). Studies were double blinded (n = 23, 61%), single blinded (n = 1, 3%), or open labeled (n = 14, 37%). Twenty-four trials (63%) included PwMS with relapsing–remitting MS only, 7 trials (18%) included people with multiple types of MS, and 4 trials (11%) included only participants with secondary progressive MS (n = 1;3%). Three trials included either only participants with primary progressive MS (n = 1; 3%), clinically isolated syndrome (n = 1; 3%), or pediatric-onset MS (n = 1; 3%). The trials evaluated 13 different DMTs, mostly interferon-beta (n = 10; 26%), fingolimod (n = 7; 18%), natalizumab (n = 5; 13%), and glatiramer acetate (n = 4; 11%); and trials evaluated two to four arms (median: 2; IQR: 2 to 3). Only 18 trials (48%) prespecified the QoL outcomes in a clinical trial registry, with 15 trials (40%) that posted results in the registry. Follow-up was 3–36 months (22 trials had a follow-up ≥ 24 months; 58%; Tables 1, 2).

Table 2 Characteristics of 38 trials reporting on QoL effects of DMTs

QoL measure characteristics

The 38 trials used 110 QoL measures or subscales of measures (a median of 2 QoL-measures per trial; IQR 1–3; range 1–11), collected by 18 different QoL instruments (9 generic QoL instruments, 8 disease-specific QoL instruments, and 1 QoL instrument that covers both generic and disease-specific subscales). A generic QoL measure was used in 29 trials (76%), mostly the Short Form 36 (SF-36; n = 18; 47%) and the European Quality of Life 5 Dimension (EQ-5D; n = 9; 24%). A disease-specific QoL measure was used in 19 trials (50%), mostly the Functional Assessment of Multiple Sclerosis (FAMS; n = 4; 11%), the Multiple Sclerosis Impact Scale-29 (MSIS-29; n = 4; 11%), and the Multiple Sclerosis Quality of Life-54 (MSQOL-54; n = 4; 11%). QoL was never the single primary outcome, only two trials (5%) used it as a co-primary outcome, and most often was a secondary outcome (n = 23; 61%). In most trials, QoL outcomes were assessed by patients themselves (n = 25; 69%); in 1 trial (3%) by parents of affected children and no detailed information on outcome assessment was reported in 15 trials (40%). Outcome assessment was mostly blinded (n = 23 trials; 61%; Table 3).

Table 3 Characteristics of 110 QoL measures used in all 38 trials

Reporting of quality-of-life results

Because multiple trials had more than two arms and used multiple QoL measures, there were up to 30 QoL results per trial with a total of 203 QoL results across the 38 trials (median of 3 per trial; IQR 2–6; range 1–30).

We identified quantitative QoL results in 24 out of the 38 trials (63%; 89 of 203 QoL results), QoL effects with p value alone in 8 trials (21%; 27 of 203 QoL results), and narrative statements in 15 trials (39%; 57 out of 203 QoL results). All trials reported at least one QoL result (85%; 173 of 203 QoL results), but not all results were reported, 30 of 203 QoL results (15%) were prespecified in the registry and/or mentioned in the article, but we identified no data (6 of 38 trials; 16%; Table 4).

Table 4 Reporting of QoL results and impact of DMT on QoL (n = 203 QoL results in n = 38 included trials)

Impact of DMTs on QoL

In 16 trials (42%), for at least one of the multiple QoL results, a statistically significant result was indicated which always favored the experimental DMT (n = 16 trials, 42%) with the exception of one QoL result in 1 trial (3%) where placebo was favored (d − 0.5; 95% CI − 0.66 to − 0.34, [42]; Supplementary file 1). Only six trials (16%) had statistically significant results in favor of the experimental DMT across all QoL results (comparisons and subscales) reported for the respective trials.

Out of all 173 reported QoL results, 50 were statistically significant and 123 were not (cases with quantitative results: 28 statistically significant and 61 with no group differences; cases with p values alone: 17 statistically significant and 10 with no group differences; and cases with narrative statements: 5 statistically significant and 52 with no group differences; Table 4; Fig. 1). Among the 89 with quantitative results, 28 point estimates of the Hedges' g were larger than the 0.2 MCID threshold, of which 24 were statistically significant. Conversely, four of the statistically significant Hedges' g did not reach the MCID threshold.

Fig. 1
figure 1

Distribution of p values (116 QoL results from 30 trials). Note: The 116 QoL results come from 89 quantitative results and 27 reported as p values alone. Abbreviations: DMT disease-modifying therapy, EQ-5D European Quality of Life Dimensions, FAMS Functional Assessment of Multiple Sclerosis, GHQ VAS General Health Questionnaire Visual Analog Scale, HAQUAMS Hamburg Quality of Life Questionnaire for Multiple Sclerosis, MSIS Multiple Sclerosis Impact Scale, MSQLI Multiple Sclerosis Quality of Life Inventory, MSQOL Multiple Sclerosis Quality of Life, MSTCQ Multiple Sclerosis Treatment Concerns Questionnaire, MusiQoL Multiple Sclerosis International Quality of Life, n number, NEI-VFQ National Eye Institute Visual Functioning Questionnaire, QoL quality of life, SF Short Form

In the nine trials (24%) with significant quantitative QoL results, the effect sizes of DMTs on QoL were large (median Cohen’s d 1.02; IQR 0.3–1.7; median Hedges’ g 1.01; IQR 0.3–1.69) and ranged between d 0.14 and 2.91 (Supplementary file 1).

Discussion

Our systematic search found 38 trials that reported the effects of DMTs on QoL outcomes in PwMS, however, rarely being the primary outcome. The trials used many different QoL subscales that were collected by multiple generic and MS-specific QoL instruments. Almost half of all trials reported at least one statistically significant QoL effect out of the multiple subscales and comparisons reported by the respective trials. This agrees with a review from 2017 which found for earlier studies that positive effects of DMTs are reported across very different scales of health related QoL [17]. However, the number of trials with statistically significant results across all QoL results reported for the respective trials was limited, only 6 out of the 38 trials. Furthermore, out of the 203 QoL results reported across all trials, only 50 were statistically significant—statistically significant results on some subscales coexisted with non-significant results on others, sometimes even within the same instrument.

While we did not systematically assess the quality of reporting, e.g., by using the CONSORT extension for PROs [55], we observed that results on QoL were not consistently reported across all trials and within trials. The inability to derive quantitative results from the reported data for almost half of our comparisons is indicative of substantial reporting deficiencies within this field and may indicate strong reporting bias. This observation is supported by a recent meta-epidemiological analysis showing the inadequate reporting of QoL in neuroscience [56]. In addition, with the inconsistencies in reporting effects within trials, one cannot exclude the possibility of selective reporting of results. For example, within a trial, not all QoL results were reported or were reported with different level of details (e.g., quantitatively for some and narratively for others). While our work cannot replace a thorough assessment of the full body of evidence on the multiple clinical questions the 38 studies aimed to address, some issues are noteworthy. As outlined above, we observed substantial reporting deficits with risk of reporting bias, the lack of blinding in some cases, and the presence of small sample sizes with imprecise effect estimates, which would all likely reduce the certainty of evidence regarding the effect of DMTs on QoL in Grades of Recommendation, Assessment, Development and Evaluation (GRADE) assessments and subsequent recommendations in clinical guidelines [57]. This highlights the urgent need for improvement of the research agenda and implications of these findings to improve not only on research, but also clinical decisions.

QoL is a subjective measure highly dependent on individual patient factors (e.g., mood) and external factors (e.g., socioeconomic status) [58]. Instruments and even subscales might have different meaning for different patients. This could explain the profusion of existing instruments and subscales as illustrated by our results and also highlighted in a 2017 systematic review, which identified 402 PROs used in MS observational studies and randomized trials (RCTs), of which 82 were MS specific and 10 focused specifically on QoL [11]. The choice of the instrument and its fit for purpose is therefore even more essential, yet many instruments used in MS have poor content validity [11, 59]. The complexity in the choice of the instrument and its interpretation most likely also explain why they only play a peripheral role in trials assessing DMTs in PwMS [5, 60] being mostly assessed as secondary or exploratory outcomes, as shown by our results. Overall, the multitude in subscales with the heterogeneity of results and inconsistent reporting makes it currently very often difficult to infer a clinical decision regarding the impact of DMTs on health-related quality of life.

Late 2022, a patient-centered outcome set according to the COMET initiative was developed recommending using MSIS-29. Our analysis provides a baseline benchmark that may allow to assess its uptake [10].

Limitations

Our work has some limitations. First, we did not assess the quality of the evidence and the risk of bias, but we assessed the study design characteristics, of which some indicate quality (sample size to provide precise effects) and risk of bias (e.g., blinding in a scenario of assessing subjective patient-reported outcomes).

Second, we also have not conducted an exhaustive search in multiple databases with very sensitive search filters; however, we have conducted a complete assessment of all full texts (not only relying on abstract information as in a typical systematic review) and complemented the search with extensive assessment of clinical trial registries. We considered QoL results that were reported in peer-reviewed journal articles. We did not consider trial protocols. There might be more results on QoL outcomes that have been prespecified in trial protcols, but not reported in the publication of trial results. If QoL outcomes were not mentioned in the publication of trial results or in the trial registry, we would have overlooked them. We clearly assume a lack of reporting of QoL results. This would mean that the underreporting and selective reporting of results were underestimated in our sample.

Third, eligible trials were selected and extracted by only one reviewer. However, as we directly searched all full texts whether they assessed and reported the impact of DMTs on QoL and the involved reviewers were experienced in these methods, we assume that this did not lead to considerable data errors.

Fourth, QoL measurements are often measured longitudinally with multiple time points, yet we only extracted and assessed the measurement with the longest follow-up available. QoL measurements may also be prone to response shift, whereby the individual’s reference of well-being changes overtime [61], potentially impacting estimation of QoL in clinical trials with longitudinal follow-up [62].

Finally, this is not a systematic review on the benefits and harms of all the DMTs to inform treatment decisions, but a meta-research survey on one type of outcome to inform further research on QoL in MS. Beyond the empirical information provided by this study, further important characteristics of the instruments need to be considered in the selection of quality-of-life instruments, such as the psychometric properties, ease of administration, and domains covered. QoL assessment and reporting is suboptimal, but is optimal to inform patient-relevant decision-making. Uncertainties remain, but gaps can only be filled by having QoL systematically assessed and reported. More insight into the treatment effect sizes can help by fostering sample size calculation by providing the range of effect sizes that can be expected. With the rise of digital health measures, assessment of QoL might gain in granularity and efficiency, opening new horizons toward a more personalized QoL assessment.

Conclusions

Our results indicate the potential of certain DMTs to positively impact QoL of PwMS. However, there seems to be no generally accepted standard for assessing disease-specific QoL in PwMS, and rarely is this end point in the focus of DMT-evaluating trials. The critical importance of QoL for PwMS urgently needs to be better reflected in the design, registration, and reporting of future MS clinical trials. This meta-research survey serves as a valuable resource for researchers, clinicians, and policymakers, promoting a deeper understanding of the interplay between DMTs and QoL in the context of MS.