Background

The emergence of disease-modifying treatments (DMTs) for multiple sclerosis (MS) in the 1990s made it clear that longitudinal and structured collection of clinical data from MS care, including treatments and outcomes, would be required to assess long-term effectiveness and safety. Consequently, some pre-existing MS registries and databases initiated the collection of treatment information, such as in Denmark [1] and France [2], whereas new national MS registries were started in other countries, including Italy [3] and Sweden [4]. Moreover, the establishment of MSBase, an international database collaboration, was initiated with the purpose of creating a global data collection platform regardless of nationality [5]. With time, it became clear that these MS registries successfully managed to collect high-quality longitudinal clinical information on large patient cohorts, contributing to a growing body of scientific literature of real-world evidence (RWE) in MS [6]. These studies focus on the epidemiology of MS including incidence, prevalence, mortality, natural disease course and time trends. Importantly, registry data contributes to pharmacoepidemiology which includes comparative effectiveness and socioeconomic studies. Following that, it became evident that RWE in MS could have potential benefits for marketing authorization holders (MAHs) and regulators. Simultaneously, the European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA) have released guidelines outlining how disease registries can serve as a foundation for regulatory determinations. Notably, EMA has identified MS as a disease in which this approach could be pioneered. The establishment of the Big Multiple Sclerosis Data (BMSD) network (https://bigmsdata.org) in 2014 was made possible by an initial grant from Biogen. Since 2019, pharma companies in the MS field have been approached and asked for their willingness to support the current activities and development of BMSD. Six pharma have supported BMSD for 3 years or more: Biogen, Bristol-Myers-Squibb, Merck, Novartis, Roche, and Sanofi. For 2023, five pharma supported BMSD. At the start, the network included the national MS registries of Denmark, France, Italy, and Sweden as well as the international MSBase registry. The national MS registry of the Czech Republic [7], previously represented by MSBase, now participates as an individual registry within the network, bringing the participating registry number to six (Fig. 1a, b). The BMSD network is made up of well-developed registries, with reasonable coverage of local MS patients, providing a reliable framework for the network and containing data from a large number of people with MS (Table 1). Each of these registries is well established as data sources for multiple scientific publications over the years (see Table 2). The plan for the future is to include more registries in the network given that they meet the expected criteria for BMSD.

Fig. 1
figure 1

a Map of MS registries. The national MS registries of Czech Republic, Denmark, France, Italy, Sweden and the international MSBase with home in Australia. b Map of MSBase registries

Table 1 BMSD MS registries
Table 2 Three selected recent publications from each of the BMSD registries with links to complete publication lists

In its early phase, BMSD mapped the member registry datasets to a minimum data set and common data model (CDM) of variables, definitions and data structure and addressed the many formal challenges of data sharing that include ethical, legal and governance aspects. As a result, data sharing and pooling were demonstrated to be feasible, leading to the execution of a series of demonstrator projects utilizing pooled data. This resulted in the publication of a number of papers thus far, with a specific focus on the long-term effectiveness of DMTs, progressive MS, and the analysis of discontinuation patterns over time [8,9,10,11].

The network’s aspiration is to harness the data from over 250,000 MS patients provided by the participating registries, thereby creating an unparalleled sample size for collaborative analysis. This vast amount of data holds the potential to yield valuable insights and findings that would otherwise be unattainable. This may be especially valuable in the context of uncommon events such as rare serious adverse events (SAEs) but also for the analyses of the study of subgroups of patients under-represented in clinical trials (e.g. children and the elderly, or patients with specific comorbidities such as cancer). Over the past decades, MAHs and regulatory organisations, such as EMA, have begun to recognise registries as potentially useful data source, especially in the context of post-authorisation safety (PASS) and effectiveness (PAES) studies. BMSD is in the process of seeking an EMA qualification opinion for PASS and has received Scientific Advice and a letter of support from EMA (https://www.ema.europa.eu/en/documents/other/letter-support-performing-registry-based-post-authorisation-safety-studies-pass-multiple-sclerosis-ms-using-data-big-ms-data-network-bmsd_en.pdf). All BMSD partners are currently contributing to PASS projects, and a qualification opinion would empower BMSD to take further responsibility for such regulator demanded studies.

Data collection

The data collected by the respective registries and their governance frameworks are a result of many years of development and has evolved by consensus within each registry organization. While the registries operate separately, they have all developed models of long-term success. Despite the independent nature of data collection, the similarities between data collected within the core dataset are striking. These similarities reflect a common aim to include variables that hold clinical significance. Additionally, all registries have developed high-quality data visualisation tools to support their data entry modules, which support neurologists in daily care and aid decisions related to individual patients, as well as providing data for research and other types of studies. Table 3 shows a core set of variables available from all registries. Although the data collection within each registry is subject to its respective governance bodies, common needs within certain collaborations have occasionally prompted agreement to include additional data in the collection, for instance during the COVID-19 pandemic.

Table 3 Description of core variables available from BMSD registries

The support of EMA and FDA to use patient registries as a basis for post-approval studies, mainly for PASS, has prompted safety to become a strong focus of BMSD in recent years. Accordingly, together with a group of pharma representatives from Biogen, Bristol Myers Squibb, Merck, Novartis, Roche, and Sanofi, BMSD has developed a core protocol for PASS. This includes a core dataset that all MS registries taking part in BMSD will be expected to follow, importantly including reported SAEs. SAEs are routinely collected in all the contributing MS registries in connection with specific treatments and can potentially be compared to unexposed groups. It is important to highlight that the SAE information collected by the individual registries has already been directly reported to the corresponding medicine product agencies (MPA) through parallel mechanisms. As a result, this data is classified as secondary and is not subject to pharmacovigilance reporting requirements. That responsibility remains with the treating physicians as legally specified, but the registry IT platforms may indeed help identify SAEs and alert physicians to report in a routine manner. Information on the SAEs aims to be classified using MedDRA terms when possible by the registries and efforts to put this in place are ongoing. Although all the registries collect some pregnancy outcome variables, some, like the Scandinavian countries, receive this information by linkage to public registries.

All BMSD registries are designed to collect SAEs. As an example, progressive multifocal leukoencephalopathy (PML), which is associated with some MS treatments, is expected to be reported. Important data items that could either improve the data collection or be of importance for risk stratification, such as lymphocyte counts (which are associated with the risk of PML during dimethyl fumarate exposure) are reported by some registries and could be relevant to propose as new data items for the other registries. Relevant new data items will be adopted over time. In fact, all BMSD registries have recently improved their collection of SAEs by adding specific questions answered at each visit/contact regarding malignancies, non-melanoma skin cancers and severe or immunosuppression-related infections (exemplified by herpes zoster). This shows that BMSD can, and in a coordinated fashion, adopt a relevant change in data collection in response to the needs of safety studies.

Typically, BMSD registries are not expected to collect non-serious adverse events, such as gastrointestinal symptoms associated with some DMTs. It is uncertain to what extent non-serious but clearly treatment-related events can be efficiently collected even if considered relevant, as the burden of data collection within contributing centres is considerable.

Data management and analysis

BMSD is a collaboration of independent MS registries and clinical outcomes databases designed to address a wide range of clinical, pharmaceutical and epidemiological research questions using a flexible CDM). Datasets are set up in a project-to-project fashion for which a project-specific statistical analysis plan is developed, based on a core CDM. Furthermore, BMSD has the facilities to securely manage and store patient-level data and also intends to set up a repository of data counts on key variables which will be updated periodically.

The large collection of data from all the BMSD registries creates a very rich combined dataset of over 250,000 patient records. In the initial studies, datasets from the respective registries were merged into a common database before analysis. Such pooling, when possible, greatly expands both study power and the range of potential statistical methods readily available for analysis. However, national and international legislations could limit direct data sharing in the future. Therefore, the BMSD registries are also scoping a federated data analysis approach which offers the benefit of joint analysis of data across several data sources without data leaving the local sites and hence legal complexities can be avoided. This encompasses descriptive statistics as well as more advanced statistical modelling like regression analysis, then referred to as federated learning often requiring multiple iterations of analysis that need to be well coordinated and simultaneous, The challenges primarily stem from the absence of established frameworks for numerous statistical models and practical limitations, including the presence of firewalls. Consequently, further development is required to overcome these obstacles and refine this approach.

Whether using a pooled dataset or a federated approach, data need to be harmonized between the data sources and organized in a CDM. A major effort is therefore to create a more complete CDM, a work which is now being finalized and which will be published in the coming year. The basis of the CDM is the BMSD data dictionary which contains close to a hundred items with agreed upon common definitions and descriptions. In addition, we have developed a BMSD CDM software which will translate a local database into the BMSD data format and generate a report on the success of the transformation of data and a second report on the data quality in terms of data density and completeness (to be published). These tools will systematically be applied to assess quality issues within and between the BMSD registries as well as for registries seeking to join BMSD in the future.

BMSD aspires to pioneer further development of federated approaches for joint analyses of data, including federated learning, to allow more complex analyses without merging data. Furthermore, another aim of BMSD is to actively promote the standardization of definitions and procedures in MS RWE research, including PASS. This will be a gradual process. Each registry will be required to harmonize its own data to the core CDM, which can be customized according to the specific requirements of each specific project and aligned within the participating registries. Once a consensus has been reached on the harmonization process, it will serve as a foundation for developing additional analytical principles.

Future perspectives

BMSD constitutes a network of MS registries working together since 2014 to provide an unparalleled real-world dataset for researchers, MAHs and regulatory bodies. BMSD will soon renew an application to EMA for a qualification opinion regarding PASS. If approved, this would provide standardised expectations for MS registries when participating in regulator-demanded studies as well as guidelines for registries interested in joining the BMSD network. Moving forward, BMSD aims to pursue a qualification opinion for PAES.

The notable advantage of BMSD lies in its possession of extensive, high-quality patient data, which allows the study of rare safety events, comparisons between countries and over time, and direct comparisons between different treatment exposures as the safety data is collected from all patients irrespective of DMT exposure.

Real-world data from patient registries differ inevitably from clinical trial data. Registry visits and tests are irregular, whereas trials conform to a very specific visit schedule. Source data verification is also usually not feasible or highly restricted. Further, a collaboration between registries from different populations using different IT platforms introduces dynamic heterogeneity in datasets which need to be handled by a well-developed data management routine and strong coordination of new or updated data fields.

Observational studies using MS registries provide opportunities for external validation of clinical trial data, head-to-head treatment comparisons and multi-year longitudinal assessments [12]. It can assess treatment effectiveness and safety in treated populations that are usually excluded from clinical trials, such as people under the age of 18 or over 55, or people with prior comorbidities such as diabetes, cancer or serious mental health issues. BMSD amplifies these opportunities further by its sample size, collegiate leadership, and support of a well-integrated network of statisticians and data managers. BMSD organized its first conference on statistical approaches in MS epidemiology in 2019 and a second in 2023 and has the expressed ambition to contribute to the development of this field of investigation.

It is a clear ambition of BMSD to include more MS registries in the future. Having spent time and effort to define common scope and properties, harmonize variables and definitions in a common data model which also provides means of assessing data quality and density, BMSD will expect MS registries wanting to join BMSD to prove their fit-for-purpose at a similar level as the current six registries. It is our impression that some non-BMSD MS registries are already now qualified to join, but a review has not yet been initiated.

In conclusion, a well-established network of MS registries offers substantial advantages for real-world data analysis, including comparative effectiveness and comparative safety and pregnancy outcomes studies. The very large sample size allows the exploration of causality and association for rare events. BMSD is already making valuable contributions to clinicians, researchers, MAHs, and regulators, with the ultimate aim of better outcomes for people with MS.