FormalPara Key Points

A survey among industry and other stakeholders was used to investigate the key aspects of rare disease registries to support regulatory decision making.

A set of demographics, clinical and medication-related data were identified that focused primarily on the disease of interest with much less emphasis on co-morbidities or adverse events.

Compared to responders from industry, the other stakeholders found it less relevant to share data with industry and found it less acceptable if the registry is financed by industry.

1 Introduction

There is a large unmet medical need for effective treatments for the 30 million patients in Europe who suffer from one of 6000–8000 rare diseases [1]. The search for new treatments in rare diseases is challenging, owing to the small and heterogenous patient populations and often limited knowledge of the diseases’ natural history [2]. Because of the scarcity of patients, and ethical concerns with denying beneficial active treatment, well‐designed controlled clinical trials to assess the efficacy of a medicinal product and to detect serious adverse events can be difficult to perform [3]. Typically, marketing authorisation of therapies in rare diseases is therefore based on less evidence than for more common disorders. Consequently, post-marketing activities, such as registry-based studies, to further evaluate the effectiveness and/or safety of a new medicinal product are crucial in this patient population. Moreover, disease registries could provide data on the natural course of a disease, which may be used as a historical or external control to support a marketing application based on single-arm trial data. However, the contribution of registries to the knowledge of a new medicinal product is currently limited because of problems with the patient accrual rate, delayed start of patient inclusion, low data quality and missing data [4,5,6].

In Europe, governance organisations are aware of these problems leading to the suboptimal use of registries and introduced efforts to improve the contribution of registries to assess risks and benefits of treatments. The European Medicines Agency (EMA) initiated the Patient Registry Initiative [7], with the main goal to make better use of existing registries to support regulatory decision making. Among other things, the Patient Registry Initiative organised meetings with stakeholders to discuss the importance of key elements of registries, including common data elements to be collected, data quality and governance aspects [8]. Attendees of these meetings were employees working in the pharmaceutical industry, regulators, academia, registry owners and patient representatives. Recently, the European Network for Health Technology Assessment (EUnetHTA) developed the Registry Evaluation and Quality Standards Tool (REQueST) [9]. This tool assesses the quality of patient registries to support more systematic and widespread use of registry data in health technology assessments. In parallel, the European Reference Networks on rare diseases made recommendations to improve the quality of registries [10] and included similar key elements as identified by the Patient Registry Initiative.

The primary aim of this study was to quantify the opinion of stakeholders about key elements of registries as data sources for studies that support regulatory decision making in the field of rare diseases. The secondary aim was to assess whether the importance attached to these key elements differed between industry stakeholders vs others.

2 Methods

2.1 Study Design and Participants

We conducted a web-based survey among stakeholders familiar with the use of registries in a regulatory context. People known via the Patient Registry Initiative of the EMA and/or owners of registries who were identified via the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance were contacted. These people received a link to the survey via an e-mail and a reminder and could further disseminate the e-mail to people in their network. The survey data were collected during April–October 2019. Ethical approval was not necessary because of the nature of the study.

2.2 Outcome Assessment

The survey was constructed in QualtricsXM (a tool for online surveys, https://www.qualtrics.com) and included in total 47 questions (Electronic Supplementary Material [ESM]). The survey was created in an iterative process by members of the project team, and members of the EMA’s Patient Registry Initiative. The survey questions were based on the themes of the workshops held at the EMA, i.e. common data elements, data quality and governance, and recurrent issues as described elsewhere [8]. In the survey, a registry was defined as “an organized system that uses observational methods to collect uniform data on specified outcomes in a population defined by a particular disease, condition or exposure” [11].

In the survey, two questions were used to assess the characteristics of the responder and two questions were included about registries in general. The other questions concerned the three key elements of registries; i.e. 24 questions on common data elements that included aspects considered to be essential for the collection of demographic and baseline data, treatment and safety outcomes, and duration of follow-up; ten questions about data quality covering data entry, optimising and improving data quality, source data verification and missing data; and four questions about aspects of governance. As the intention of this study was to assess the importance attached to these key elements for the use of post-marketing studies, five questions were added about registry-based studies. A registry-based study uses a registry infrastructure for patient recruitment and data collection [11].

The survey contained three types of questions: (1) multiple-choice questions where the respondent could choose one or more of the answer options; (2) Likert scale questions with answer options from (1) very unimportant to (5) very important; and (3) a visual analogue scale (VAS). The web-based version of the survey was pretested on functioning and content by seven persons working at the EMA and/or the Dutch Medicines Evaluation Board. After minor adaptations, the survey was distributed.

2.3 Analyses

Respondents who completed ≥ 80% of the survey were included in the analyses. Descriptive analyses were conducted and results of the multiple-choice questions and Likert scale questions are presented as number and percentage for all included responders and per stakeholder group (i.e. industry vs other stakeholders). The results of the VAS are presented as medians with the interquartile range (IQR).

Differences in responses between industry and the other stakeholders were tested using Pearson \(\chi^{2}\) tests or Mann–Whitney U tests for the outcomes measured using respectively multiple-choice and Likert scale questions/VAS scale questions. Given the large number of questions and subsequent tests performed, p-values < 0.01 were considered statistically significant. An element was considered important if ≥ 80% of the respondents gave it a score of important or very important. This cut-off has been used previously [12] and was used to separate “need to know” elements from “nice to know” elements. The same cut-off was used for the multiple-choice questions and the VAS. Data were analysed using IBM SPSS Statistics 20 (IBM, Armond, New York, United States). Microsoft Excel 2010 (Microsoft, Redmond, Washington, United States) was used for graphical presentation of results.

3 Results

There were 201 persons who opened the survey of whom 73 respondents (36%) completed ≥ 80% of the survey. The median time to complete the survey was 26 minutes (IQR 18–59). Most of them were employees of the pharmaceutical industry (n = 42; 57%). The other 31 respondents were employees of European regulatory authorities (n = 9; 12%), employees of health technology assessment agencies (n = 5; 7%), owners of registries (n = 5; 7%), patient representatives (n = 3; 4%), physicians (n = 2; 3%) or they did not specify their role (n = 7; 10%).

3.1 Overall Results

3.1.1 General Questions About Registries

A minimal coverage of patients that in the respondents’ view was needed to guarantee a minimal representation of the disease population for use of the registry to support regulatory decision making was 40% (median, IQR 28–60) (Fig. 1). For the geographical spread of the centres, most respondents considered it important to have centres within more than one country in Europe (92%) and to have at least more than one clinical centre (90%) that collects data (ESM).

Fig. 1
figure 1

Percentage of minimal coverage of patients that in the respondents’ view is needed to represent the disease population by stakeholder group (all, industry, other stakeholders)

3.1.2 Common Data Elements

Demographic data considered important to be collected in registries were sex (99%), vital status (93%), age (88%) and current pregnancy (86%) (Fig. 2 and ESM).

Fig. 2
figure 2

Results of the importance attached to the collection of various demographic data. BMI body mass index

Data that need to be collected were clinical data (96% of respondents), treatment data (96%), laboratory data (90%) and patient-reported outcomes (PROs; 82%). For clinical data, the elements considered important were (first date of) diagnosis, severity of the disease, physical function, organ damage and confirmation of the diagnosis; for treatment data, the medical product (93%) and the intervention (86%); for laboratory, blood tests (88%) and biomarkers; (86%); and for PROs, the validated disease questionnaires (82%). Sixty-one percent of the respondents would collect a limited set of baseline data only or found co-morbidity data collection not necessary. Most respondents reported to base the diagnosis on clinical practice guidelines (73%) and the confirmation on a doctor’s recorded diagnosis (80%) [ESM].

Respondents indicated that for the medication used to treat the disease of interest, the dosage (96%), the substance (90%), the reason to stop or to switch to another product (89%), and the start and stop date (84%) should be captured (Table 1). Information on non-pharmacological interventions should be captured according to 89% of the respondents. Seventy-four percent of the respondents would collect no or a limited set of data only for medicinal products used to treat co-morbidities (ESM).

Table 1 Number (percentage) of respondents (all, industry and other) that considered the common data elements-related questions importanta with p-values of Pearson \(\chi^{2}\) tests for differences between industry and the other stakeholders

Respondents considered the following data about pregnant women to be of key importance; the exposure to any medication during pregnancy (90%), the outcome of the pregnancy (90%), the trimester during exposure (84%) and the follow-up of teratogenic events (84%) (Table 1). Less than half of the participants (44%) would collect details of medication use of the partner, either before or during pregnancy (ESM).

Treatment outcomes that the respondents found important to collect pertained to clinical (97%), treatment (96%), laboratory (89%) and PRO (86%) data. Respondents would base the efficacy outcomes primarily on clinical practice (73%), EMA guidelines (69%) and evidence-based literature (69%), but these scores did not reach the predefined 80% threshold. A doctor’s recorded diagnosis (81%) was considered most relevant for confirmation of endpoints. Seventy-eight percent of the respondents indicated that only validated endpoints should be used. The majority of respondents thought that endpoints measuring disease progression should be monitored at least quarterly (43%) or twice a year (31%) [ESM].

For safety outcomes, 64% of the respondents indicated to collect adverse events of special interest, 62% serious adverse events, and 42% all adverse events. In the case of a reported adverse event, all elements were considered important to record; severity (97%), duration (85%) and, if applicable, causality assessment (85%) (Table 1).

Fifty-four percent of the respondents indicated that the duration of the follow-up of a patient should be 1–5 years. During follow-up, use of a medicinal product (89%), whether a patient was lost to follow-up (88%) and the underlying reason (82%) should be captured (ESM).

3.1.3 Data Quality

According to the respondents, data were currently mostly entered into registries through web-based platforms (46%) and imported from electronic health records (34%). Data should be entered, in order of preference, by trained staff (75%), treating physicians (60%), patients themselves (58%) or a study coordinator (42%). Data entry was preferred at the time of the actual patient visit (75%) [ESM]. To improve data quality, it was considered of prime importance to have collection instructions (96%), use appropriate software (94%), have well-trained staff (94%) and use standard terminology (90%). To minimise missing data, respondents indicated the value of automated queries (97%), maximising data import from electronic health records (92%) and the use of mandatory fields (90%) to be important. For the improvement of consistency and/or accuracy, alerts (94%) and missing fields over time (80%) were considered important. Annual regular checks (78%) were strongly preferred over random checks (17%) [ESM]. Respondents indicated that to ensure data quality, 30% (IQR 10–54) of source data should be verified and up to 20% (IQR 10–25) of missing data for the key values could be acceptable (Figs. 3 and 4).

Fig. 3
figure 3

Percentage of source verification needed that is acceptable by the respondents by stakeholder group (all, industry, other stakeholders)

Fig. 4
figure 4

Percentage of missing data that is acceptable by the respondents by stakeholder group (all, industry, other stakeholders)

3.1.4 Governance

The availability of a central contact point (96%) and data sharing across countries (86%) were considered important. Most respondents considered it relevant for regulatory decision making that registry data are shared with regulatory authorities (94%) and academic centres (85%). Additionally, most respondents found it acceptable that registry data are financed by regulatory authorities (92%) or academia (83%), but less so by pharmaceutical companies (78%) or patients (53%) (Table 2). Moreover, respondents indicated that 92% (IQR 81–100) of the registry data should be FAIR (Findable, Accessible, Interoperable, and Reusable; Table 2).

Table 2 Number (percentage) of respondents (all, industry and other) that considered the governance-related questions importanta with p-values of Pearson \(\chi^{2}\) tests for differences between industry and the other stakeholders

3.1.5 Registry-Based Studies

Regarding registry-based studies, respondents found it important that a common study protocol is available in the case of a multi-centre registry (92%) and to have the registry-based study protocol recorded in the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance database (84%). The primary objective (97%), the secondary objective (90%) and a statistical analysis plan (85%) should be predefined in a study protocol. The respondents suggested that missing data, analysis strategy, bias, treatment discontinuation, confounders and effect modifiers should all be addressed in the statistical analysis plan. To perform a randomised study in the context of registry-based studies was considered less essential (i.e. selected by 61% of the respondents) [ESM].

3.2 Industry vs Other Stakeholders

In six of the 47 questions, a statistically significant difference was observed between industry and the other stakeholders. The coverage of patients needed to guarantee a minimally acceptable representation of the disease population within the registry was lower for respondents from industry than for respondents from other stakeholders (30% [IQR 20–50] vs 56% [IQR 32–78], p < 0.01) (Fig. 1). Compared with the group of other stakeholders, respondents from the industry found the exposure to any medication during pregnancy more important to register (100% vs 76%, p < 0.01) (Table 1); would use evidence-based literature less often for selecting the common data elements about the disease; i.e. 52% vs 84%, p < 0.01 (ESM); rated the possibility to request additional information from a treating physician as more important (86% vs 52%, p < 0.01); found it more relevant to share data with pharmaceutical companies (90% vs 45%, p < 0.01); and found it more acceptable if the registry is financed by pharmaceutical companies (95% vs 53%, p < 0.01) (Table 2).

4 Discussion

This study indicated the key aspects in terms of common data elements, data quality and governance for rare disease registries that were important (rated important or very important by ≥ 80% respondents) to stakeholders. A set of demographics, clinical and medication-related data were identified that focused primarily on the disease of interest with much less emphasis on co-morbidities or adverse events. Respondents considered that 30% of source data verification and 20% of missing data would provide acceptable levels of data quality. Regarding governance, availability of a central contact point and the ability to share data with regulatory authorities was considered important for disease registries to support regulatory decision making in the setting of rare diseases. Regarding registry-based studies, thorough epidemiological and predefined research protocols were expected, with less emphasis on the need for randomised designs. There were few differences between the industry and the other stakeholders. With regard to governance aspects, the other stakeholders found it less relevant to share data with industry and found it less acceptable when a registry is financed by industry.

A core common data set is essential for the interoperability of registries to allow the exchange of data [13]. Previously, a set of common data elements was released by the European Platform on Rare Disease Registration [14]. This set included date of birth, sex and vital status. Our study confirmed the importance of collecting these elements and additionally identified that the stakeholders find it important to collect data on pregnancy. Depending on the therapeutic area or patient population, the choice of key disease-related data elements may, however, differ [15]. To prevent inconsistency in the capturing of the elements, clear definitions need to be formulated [13]. In this context, the defined core data sets by the European Society for Blood and Marrow Transplantation Registry and the European Cystic Fibrosis Society Patient Registry can be used, which have been shown to support regulatory decision making [16,17,18,19].

In our study, less than 50% of the respondents indicated that all adverse events should be collected. During one of the previous meetings organised by the EMA’s Patient Registry Initiative, routine collection of adverse events was indicated to be a burden [20]. Most registries focus on the collection of serious adverse events and/or adverse events of special interest [20]. An example is the TREatment of ATopic eczema (TREAT) Registry that has the secondary objective to collect eye disorders and eosinophilia as specific types of adverse events [21]. Another example is the REGIMS Registry that aims to assess the incidence, type and consequences of side effects of multiple sclerosis immunotherapies [22]. Requirements for post-approval safety data management are described in the International Council on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use guideline E2D and good pharmacovigilance practices module VI [23, 24]. Companies have the obligation for solicited cases to perform an assessment of causality and submit the causally related adverse drug reactions to the relevant authorities [25, 26]. While registries could be an important source to evaluate long-term drug effects including safety outcomes that are often incomplete at the time of drug approval, the collection of such data within most disease registries does not allow a causality assessment in line with International Council on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use guidance [27]. If registries are used as a data source for post-authorisation safety studies, it should be clear what the expectations are with respect to adverse event collection and managing follow-up information, causality assessment and, where appropriate, reporting timelines. Registries should provide accurate, timely and follow-up data on serious adverse events to enable a causality assessment.

Good quality data is crucial for a thorough benefit-risk evaluation of medicinal products. According to the International Council on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use Guidance for good clinical practice, clinical trials should be monitored to verify that reported data are accurate, complete and accounted for by source records [28]. Our survey study showed that responders considered a level of 20% acceptable for missing data. A threshold of 10% of missing data based on a sample of 200 patients of the registry data was suggested by participants of one of the EMA disease-specific registry workshops [20]. It is recognised that most patient registries will have at least some missing data. Approaches to minimise the amount of missing data should be considered as part of the registry protocol and analysis plan. However, no guidance is given on what proportion of data should be verified. It is acknowledged that even 100% source data verification does not guarantee that a 0% error rate can be achieved [29]. A risk-based approach in combination with reduced source data verification could be a good solution to verify the data [30,31,32]. Source data verification for 10% of the registry data were suggested by participants of one of the EMA disease-specific registry workshops [33]. This implies that the outcome of our study, laying the bar at 30% for data verification, may not be feasible in practice. The results of our survey could provide a starting point to discuss which and how much data should be verified to guarantee validity of the data to be acceptable to all potential stakeholders.

To our knowledge, this is the first study assessing on a larger scale the importance that stakeholders attach to key aspects of registries in the field of rare diseases for regulatory decision making. A limitation of the study is that of the 201 participants who received and opened the survey, 82 (41%) did not respond to any question. Forty-six participants (23%) finished after completing only a few questions and 73 (36%) completed ≥ 80% of the survey and were included in the analyses. Reasons for the drop-out are regrettably unknown, but 82 (41%) did not respond to any question. This suggests that problems related to navigating through the survey and its content are unlikely. Related to this is that it should be noted that the survey used in this study was pretested on functioning and content among a small number (n = 7) of regulators only and that no formal validation procedures were applied. Although respondents were allowed to skip questions in the survey, for instance, if a question was unclear to the respondent or not applicable, it is possible that some questions have been interpreted differently between respondents. Additionally, the generalisability of our findings to a wider population should be assessed in future studies because the individuals included in our study probably had a particular interest in registries having participated in the EMA workshops, or were connected to the European Network of Centres for Pharmacoepidemiology and Pharmacovigilance, and only a small and heterogenous number of other stakeholders was included. Furthermore, we used a cut-off of 80% for the responses to indicate importance, both for the Likert scale and the multiple-choice questions. Although this cut-off level has been used previously [12], this could still be considered rather arbitrary.

5 Conclusions

This study showed that the opinion towards data and governance is well aligned across parties, and issues of data and governance on their own should not pose a barrier to collaboration. This finding is supportive of the EMA’s efforts to encourage stakeholders to work with existing registries when collecting data to support regulatory decision making.