Clinical trial evidence of quality-of-life effects of disease-modifying therapies for multiple sclerosis: a systematic analysis

Background Increasingly, patients, clinicians, and regulators call for more evidence on the impact of innovative medicines on quality of life (QoL). We assessed the effects of disease-modifying therapies (DMTs) on QoL in people with multiple sclerosis (PwMS). Methods Randomized trials assessing approved DMTs in PwMS with results for at least one outcome referred to as “quality of life” were searched in PubMed and ClinicalTrials.gov. Results We identified 38 trials published between 1999 and 2023 with a median of 531 participants (interquartile range (IQR) 202 to 941; total 23,225). The evaluated DMTs were mostly interferon-beta (n = 10; 26%), fingolimod (n = 7; 18%), natalizumab (n = 5; 13%), and glatiramer acetate (n = 4; 11%). The 38 trials used 18 different QoL instruments, with up to 11 QoL subscale measures per trial (median 2; IQR 1–3). QoL was never the single primary outcome. We identified quantitative QoL results in 24 trials (63%), and narrative statements in 15 trials (39%). In 16 trials (42%), at least one of the multiple QoL results was statistically significant. The effect sizes of the significant quantitative QoL results were large (median Cohen’s d 1.02; IQR 0.3–1.7; median Hedges’ g 1.01; IQR 0.3–1.69) and ranged between d 0.14 and 2.91. Conclusions Certain DMTs have the potential to positively impact QoL of PwMS, and the assessment and reporting of QoL is suboptimal with a multitude of diverse instruments being used. There is an urgent need that design and reporting of clinical trials reflect the critical importance of QoL for PwMS. Supplementary Information The online version contains supplementary material available at 10.1007/s00415-024-12366-5.


Background
Multiple sclerosis (MS) is a chronic, degenerative, and often progressive disease of the central nervous system that can affect multiple parts of the body and result in various symptoms including mobility restrictions, fatigue, pain, depression, and changes in vision and cognition [1].These symptoms typically lead to a deterioration of the quality of life of persons with MS (PwMS) [2].
While there is no cure yet, key treatment for PwMS includes disease-modifying therapies (DMTs), which target the inflammatory response of the immune system resulting in an almost complete suppression of the disease activity (i.e., suppressing relapses and the occurrence of new lesions in the central nervous system), which in turn would prevent disease progression and may reduce or even avoid the development of disability [3,4].With this mechanistic approach in mind and due to the multilayered aspects of the disease activity, numerous clinical trials have evaluated the effects on disability worsening using the Expanded Disability Status Scale (EDSS), relapses, and/or magnetic resonance imaging (e.g., new lesions or brain atrophy) as primary end points [5].However, patients, clinicians, guideline developers, medical associations, payers, and regulators have increasingly called for more evidence on the impact of innovative medicines for MS on patient-reported outcomes (PROs) [6][7][8].
PROs are any assessment directly appraised and reported by patients of their health status, such as pain, fatigue, and in particular quality of life (QoL).QoL provides the patient's unique perspective on both beneficial and harmful treatment effects including treatment burden, symptom alleviation, side effects of treatment and control of disease activity, and increase patients' involvement in shared decision-making by being key actors in assessing their health.A survey conducted in over 2,000 persons with MS has identified QoL measures, MS symptoms, and preservation of cognition as priority criteria when selecting a DMT [9].Similarly, a recent initiative developed a patient-centered standard outcome set for MS identifying four domains of interest: disease activity, symptoms, functional status, and QoL [10].
The complexity of the measurement with often multiple domains or subscores and the heterogeneity of instruments being used [5,11] render comparisons across DMTs difficult and informing treatment decisions hazardous.We aimed to explore which effect estimates on QoL assessed with which instruments are typically obtained in evaluations of DMTs, as a systematic benchmark for an evidence-based research agenda focused on patient-centered end points in MS.

Methods
We conducted a systematic analysis of clinical trials that assessed the effects of DMTs on QoL in PwMS.To structure our review report, we followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [12].This study was not prospectively registered; and we did not critically appraise the included studies.

Eligibility criteria
We included randomized controlled trials (RCTs) comparing the effects of approved DMTs with any comparator (e.g., placebo, standard of care, active comparator) on at least one outcome referred to as "quality of life" that reported these effects in a journal article in English language or that were registered in a clinical trial registry with QoL as a prespecified outcome and the trial was categorized as "completed" and/or "has results".There was no restriction to MS type, disease severity, setting, or year of publication.We considered approved DMTs identified in the European public assessment reports by the European Medicines Agency (EMA) [13] and the Drugs@FDA database by the Food and Drug Administration (FDA) [14] (as of September 29, 2022).We excluded trial protocols and trials labeled as 'extension'.

Information sources and search strategy
We searched PubMed (last search: October 4, 2022; Supplementary file 1) using (i) a disease-specific search component (development informed by a recent Cochrane review [15] and several reviews on approved DMTs in MS [16][17][18]) combined with (ii) identified drug names and corresponding brand names of approved DMTs.Each DMT was transposed into the search strategy by combining a free text word (all fields search and automatic term mapping) with a related Medical Subject Heading (MeSH), if available, e.g., 'Interferon OR Interferons[MeSH]'.To specifically identify RCTs, we limited the search hits using the PubMed-integrated filter 'Randomized Controlled Trial'.
We searched ClinicalTrials.gov(last search: April 6, 2023; Supplementary file 1) using the search fields condition ('Multiple sclerosis'), outcome measure ('Quality of life'), and intervention (list of identified DMTs as used for the free text PubMed search); limited to trials ('Interventional'), described as "completed" or "has results".Registry entries with results were considered for study selection and data extraction.Registry entries of completed trials without posted results were followed to identify journal articles by searching PubMed (all fields search) and Google Scholar using the trial registry number (last search: April 13, 2023).

Study selection
All full texts of the retrieved journal articles were collected and considered as potentially eligible since the list of outcome measures might be rarely reported in the title and/ or abstract.Full texts were screened by one reviewer for eligibility assessment (out of JH, PJ, KD, and TVN) with confirmation of a second reviewer in any unclear case (out of JH, PJ, and LGH).

Data extraction
One reviewer (out of JH, KD, or PJ) extracted information on trial sample (type of MS, number of randomized participants), intervention and comparator, outcome (number, name, type, hierarchy, and assessor of QoL measures and subscales used, longest follow-up length available, outcome prespecification), design characteristics (blinding and number of trial arms), QoL results (within and between randomized group comparisons; based on registry entries, if available), and bibliographic information (i.e., metadata such as authors, publication year, and journal).Data extraction was based on publication(s) and/or a corresponding registry entry using an electronic spreadsheet.

Data analysis
We summarized the trial characteristics using descriptive statistics.We considered all individual subscales of QoL instruments as QoL measures.QoL instruments were categorized as generic or disease-specific measures according to the description of the QoL instrument.Symptom-specific QoL instruments were categorized as disease-specific measures.
We considered comparative effects (i.e., between randomized groups) as QoL results.Data on QoL not reflecting effects of treatments (e.g., before-after changes within study arms) were not considered.We stratified QoL results by type of outcome measure (i.e., disease-specific or general QoL measure).We categorized QoL results into (1) quantitative QoL effects (between-group differences with dispersion were reported, e.g., mean differences with standard deviation or data on change from baseline to follow-up were reported that allowed us to calculate between-group differences); (2) QoL effects with p value alone (not providing data to derive quantitative effects); (3) narrative statements on betweengroup differences.To assess the statistical significance of the QoL results, we used reported p values (with p < 0.05 indicating statistical significance), reported or self-calculated confidence intervals (CI; with confidence interval of a mean difference not crossing the null indicating statistical significance), or statements by authors declaring the results as "statistically significant".To allow the comparison of effects sizes across trials, whenever possible, we converted quantitative QoL effects to Cohen's d and Hedges' g.We also report the proportion of effect size superior to the 0.2 (Hedges' g) minimal clinically important differences (MCID) as defined by current guidelines for health-technology assessment and reimbursement decisions on quality of life assessments [19].We used R (version 4.2.2) for data analysis.

Reporting of quality-of-life results
Because multiple trials had more than two arms and used multiple QoL measures, there were up to 30 QoL results per trial with a total of 203 QoL results across the 38 trials (median of 3 per trial; IQR 2-6; range 1-30).
We identified quantitative QoL results in 24 out of the 38 trials (63%; 89 of 203 QoL results), QoL effects with p value alone in 8 trials (21%; 27 of 203 QoL results), and narrative statements in 15 trials (39%; 57 out of 203 QoL results).All trials reported at least one QoL result (85%; 173 of 203 QoL results), but not all results were reported, 30 of 203 QoL results (15%) were prespecified in the registry and/or mentioned in the article, but we identified no data (6 of 38 trials; 16%; Table 4).

Impact of DMTs on QoL
In 16 trials (42%), for at least one of the multiple QoL results, a statistically significant result was indicated which always favored the experimental DMT (n = 16 trials, 42%) with the exception of one QoL result in 1 trial (3%) where placebo was favored (d − 0.5; 95% CI − 0.66 to − 0.34, [42]; Supplementary file 1).Only six trials (16%) had statistically significant results in favor of the experimental DMT across all QoL results (comparisons and subscales) reported for the respective trials.
Out of all 173 reported QoL results, 50 were statistically significant and 123 were not (cases with quantitative results: 28 statistically significant and 61 with no group differences; cases with p values alone: 17 statistically significant and 10 with no group differences; and cases with narrative statements: 5 statistically significant and 52 with no group differences; Table 4; Fig. 1).Among the 89 with quantitative results, 28 point estimates of the Hedges' g were larger than the 0.2 MCID threshold, of which 24 were statistically significant.Conversely, four of the statistically significant Hedges' g did not reach the MCID threshold.
In the nine trials (24%) with significant quantitative QoL results, the effect sizes of DMTs on QoL were large (median Cohen's d 1.02; IQR 0.3-1.7;median Hedges' g 1.01; IQR 0.3-1.69)and ranged between d 0.14 and 2.91 (Supplementary file 1). reported at least one statistically significant QoL effect out of the multiple subscales and comparisons reported by the respective trials.This agrees with a review from 2017 which found for earlier studies that positive effects of DMTs are reported across very different scales of health related QoL [17].However, the number of trials with statistically significant results across all QoL results reported for the respective trials was limited, only 6 out of the 38 trials.Furthermore, out of the 203 QoL results reported across all trials, only 50 were statistically significant-statistically significant results on some subscales coexisted with non-significant results on others, sometimes even within the same instrument.
While we did not systematically assess the quality of reporting, e.g., by using the CONSORT extension for PROs [55], we observed that results on QoL were not consistently reported across all trials and within trials.The inability to derive quantitative results from the reported data for almost half of our comparisons is indicative of substantial reporting deficiencies within this field and may indicate strong reporting bias.This observation is supported by a recent metaepidemiological analysis showing the inadequate reporting of QoL in neuroscience [56].In addition, with the inconsistencies in reporting effects within trials, one cannot exclude the possibility of selective reporting of results.For example, within a trial, not all QoL results were reported or were reported with different level of details (e.g., quantitatively for some and narratively for others).While our work cannot replace a thorough assessment of the full body of evidence on the multiple clinical questions the 38 studies aimed to address, some issues are noteworthy.As outlined above, we observed substantial reporting deficits with risk of reporting bias, the lack of blinding in some cases, and the presence of small sample sizes with imprecise effect estimates, which would all likely reduce the certainty of evidence regarding the effect of DMTs on QoL in Grades of Recommendation, Assessment, Development and Evaluation (GRADE) assessments and subsequent recommendations in clinical guidelines [57].This highlights the urgent need for improvement of the research agenda and implications of these findings to improve not only on research, but also clinical decisions.
QoL is a subjective measure highly dependent on individual patient factors (e.g., mood) and external factors (e.g., socioeconomic status) [58].Instruments and even subscales might have different meaning for different patients.This could explain the profusion of existing instruments and subscales as illustrated by our results and also highlighted in a 2017 systematic review, which identified 402 PROs used in MS observational studies and randomized trials (RCTs), of which 82 were MS specific and 10 focused specifically on QoL [11].The choice of the instrument and its fit for purpose is therefore even more essential, yet many instruments used in MS have poor content validity [11,59].The complexity in the choice of the instrument and its interpretation most likely also explain why they only play a peripheral role in trials assessing DMTs in PwMS [5,60] being mostly assessed as secondary or exploratory outcomes, as shown by our results.Overall, the multitude in subscales with the heterogeneity of results and inconsistent reporting makes it currently very often difficult to infer a clinical decision regarding the impact of DMTs on health-related quality of life.
Late 2022, a patient-centered outcome set according to the COMET initiative was developed recommending using MSIS-29.Our analysis provides a baseline benchmark that may allow to assess its uptake [10].

Limitations
Our work has some limitations.First, we did not assess the quality of the evidence and the risk of bias, but we assessed the study design characteristics, of which some indicate quality (sample size to provide precise effects) and risk of bias (e.g., blinding in a scenario of assessing subjective patient-reported outcomes).
Second, we also have not conducted an exhaustive search in multiple databases with very sensitive search filters; however, we have conducted a complete assessment of all full texts (not only relying on abstract information as in a typical systematic review) and complemented the search with extensive assessment of clinical trial registries.We considered QoL results that were reported in peer-reviewed journal articles.We did not consider trial protocols.There might be more results on QoL outcomes that have been prespecified in trial protcols, but not reported in the publication of trial results.If QoL outcomes were not mentioned in the publication of trial results or in the trial registry, we would have overlooked them.We clearly assume a lack of reporting of QoL results.This would mean that the underreporting and selective reporting of results were underestimated in our sample.
Third, eligible trials were selected and extracted by only one reviewer.However, as we directly searched all full texts whether they assessed and reported the impact of DMTs on QoL and the involved reviewers were experienced in these methods, we assume that this did not lead to considerable data errors.
Fourth, QoL measurements are often measured longitudinally with multiple time points, yet we only extracted and assessed the measurement with the longest follow-up available.QoL measurements may also be prone to response shift, whereby the individual's reference of well-being changes overtime [61], potentially impacting estimation of QoL in clinical trials with longitudinal follow-up [62].
Finally, this is not a systematic review on the benefits and harms of all the DMTs to inform treatment decisions, but a meta-research survey on one type of outcome to inform further research on QoL in MS.Beyond the empirical information provided by this study, further important characteristics of the instruments need to be considered in the selection of quality-of-life instruments, such as the psychometric properties, ease of administration, and domains covered.QoL assessment and reporting is suboptimal, but is optimal to inform patient-relevant decision-making.Uncertainties remain, but gaps can only be filled by having QoL systematically assessed and reported.More insight into the treatment effect sizes can help by fostering sample size calculation by providing the range of effect sizes that can be expected.With the rise of digital health measures, assessment of QoL might gain in granularity and efficiency, opening new horizons toward a more personalized QoL assessment.

Conclusions
Our results indicate the potential of certain DMTs to positively impact QoL of PwMS.However, there seems to be no generally accepted standard for assessing disease-specific QoL in PwMS, and rarely is this end point in the focus of DMT-evaluating trials.The critical importance of QoL for PwMS urgently needs to be better reflected in the design, registration, and reporting of future MS clinical trials.
This meta-research survey serves as a valuable resource for researchers, clinicians, and policymakers, promoting a deeper understanding of the interplay between DMTs and QoL in the context of MS.

Fig. 1
Fig. 1 Distribution of p values (116 QoL results from 30 trials).Note: The 116 QoL results come from 89 quantitative results and 27 reported as p values alone.Abbreviations: DMT disease-modifying therapy, EQ-5D European Quality of Life Dimensions, FAMS Functional Assessment of Multiple Sclerosis, GHQ VAS General Health Questionnaire Visual Analog Scale, HAQUAMS Hamburg Quality of Life Questionnaire for Multiple Sclerosis, MSIS Multiple Sclerosis

Table 1
Summary characteristics of trials reporting on QoL effects of DMTs (n = 38) *More than one category possible **Single counts; see Supplementary file 2 for details ***For blinding of outcome assessment, see Table 3**** ± 4 weeks DMT disease-modifying therapy, IQR interquartile range, m months, M median, MS multiple sclerosis, n number, QoL quality of life

Table 2
Characteristics of 38 trials reporting on QoL effects of DMTs

Table 2
*Including subscales ** ± 4 weeks m months, MS multiple sclerosis, n number, NA not available, QoL quality of life, vs versus

Table 4
[37]rting of QoL results and impact of DMT on QoL (n = 203 QoL results in n = 38 included trials) *More than one category per trial possible **Experimental and control has not been defined in one trial[37]DMT disease-modifying therapy, n number, QoL quality of life