Background

The European Union (EU) acknowledges the relevance of registries as key instruments for developing rare disease (RD) clinical research, improving patient care and health service (HS) planning [1, 2]. The European Commission has funded the EPIRARE and other projects on EU patient registration, and stated that its strategic objective is the creation of the European Platform for RD patient registration (RDR), providing common services and tools for the existing (and future) rare disease registries in the EU [3]. The EPIRARE project (“Building Consensus and Synergies for the EU Registration of Rare Disease Patients”, http://www.epirare.eu), studied a model for this platform [4] and concluded that it should have an important role in improving standardization and data comparability and, where useful, supporting the set up of new registries. Actual data collection should be limited to diseases for which disease-specific registries are not sustainable or for which there is no specific research interest. This article presents the results of the EPIRARE project defining a set of common data elements (CDE) for the European RDR Platform. Although European or wider data sharing would be desirable to increase the power of data analyses, the reference to the European RDR Platform CDE by new and existing registries will impact positively on data and indicator comparability independently of data sharing, which might be dramatically hampered by the next regulation on personal data protection, which is currently under discussion in the EU Parliament.

Methods

In line with recommended methodologies [5], at first a reference list of registry-based indicators was defined, starting from the indicators identified by the EUROPLAN project [6] and the EU Rare Disease Task Force (RDTF) [7]; some indicators were slightly modified or added, in consideration of the opinions expressed by the RDTF experts and of the information needs of the identified stakeholders as resulting from the surveys [8, 9] and consultations [4] carried out during the EPIRARE activities. The experts who reviewed the cited RDTF document and the EUROPLAN Working Group on indicators are reported in the cited documents. The process of selection of the addressees of the EPIRARE surveys and consultations is reported in the cited references. More detailed indications of the respondents and the EPIRARE advisory board members are presented, respectively, in the deliverables and partners sections of the EPIRARE project website (http://www.epirare.eu). The resulting set of variables necessary for the computation of these indicators was compared with the information regarding institutional initiatives for national RD registries already established or in preparation which were notified to EPIRARE from experts in Belgium, Bulgaria, France, Germany, Italy and Spain in order to have the highest consistency among EU registries. The definitions and formats of the selected variables were kept as far as possible similar to the data elements used in the US NIH Global Rare Disease Registry to facilitate any possible collaborative work. Finally, the peculiarities of some variables and of their collection were also considered to elaborate the proposed organization of the CDE set.

Results and discussion

The set of reference indicators

The set of rare disease indicators, which were used in this study as reference for the selection of the CDE, is reported in “Additional file 1”. These indicators span from disease surveillance, to socio-economic burden, HS monitoring, research and product development, policy equity and effectiveness. The indicators mentioned in the research area have generic definitions, but represent many possible indicators which may be defined for specific goals, mostly depending on clinical data. “Additional file 1” reports also the variables which were considered necessary for the computation of each indicator.

Specific features of groups of variables

Besides the computation of sound platform indicators and other information outputs, some variables have a particular importance for the best use of registry data. These comprise a) an unambiguous universal patient coding; b) the variables allowing indicator analysis by diagnosis, geographic location of the patient and health care services used by the patient; and c) variables allowing the ethical processing of patient data, including his/her willingness to participate in research.

The set of common data elements and its organization

Following the results of the analysis described above and in line with the cluster analysis of the scope of data collection by registries with different aims (Santoro M, Coi A, Lipucci Di Paola M, Gainotti S, Mollo E, Taruscio D, Vittozzi L, Bianchi F: A classification of the Rare Disease Patient Registries aimed at identifying different informative needs, submitted), the data elements were organized in three different domains (Table 1). The first domain aims mainly at facilitating the completeness of case notification and includes the case identification, the geographical location of the patient and of the services involved in the patient treatment, as well as information on the patient position regarding his/her participation in research. This is the minimum information necessary to characterize the case and most of it is collected in usual medical practice; therefore, it is proposed as the mandatory set of data elements. It is made of data which are in the knowledge of the patient (or their family) and which can be entered without the involvement of physicians or the health services which follow the patient. Although validation of patient-reported data may be recommended before its inclusion in the database, this additional source, by promoting the case notification to registry holders, may increase the sensitivity of the registration system and allow also sensitivity estimates. Finally, this data set provides information on the patient distribution and problem dimension, and is of use for HS and clinical trial planning, for the prioritization of product development and for patient advocacy. The variables necessary to compute a univocal patient code (EU GUID) have been selected following the results of Johnson et al. [10]. However, to improve coding accuracy in a global context with multiple languages and alphabets, it is considered necessary that EU registry sources collect two additional elements for the EU GUID elaboration: the country of birth, which is already collected in the US-GRDR [11], and the national unique identification code.

Table 1 The EPIRARE set of common data elements for the European RDR platform

The second domain of the platform data elements aims at characterizing the patient risk factors and at monitoring and planning the operation of the health services. It extends the patient characterization with genetic data and with data regarding his/her health status and familial information. Moreover, this domain includes data regarding the history and status of diagnosis and treatments. This information can be collected from a variety of sources and requires specific methodological expertise for the data collection and use for HS research.

The third domain aims at supporting outcome analysis. It includes data of patient death; of health-related quality of life (HRQoL), education level attained and occupational status for an integrated assessment of the patient condition;, and of co-morbidity and other symptoms, which are observed and may be associated with the case disease and treatments. The assessments of the education level attained, occupational status and HRQoL, which are not in the usual interest of pathology registries, require the administration of a short questionnaire. These data are extremely important since many RD are not impacting on the lifetime and can serve many purposes, from patient-centered description of the disease course, to monitoring the impact of policies and best practices, to provide a basis for patient advocacy actions and to equity decisions based on the burden of disease and on assessments cutting across all diseases. The variety of disease specific clinical data and of their observation conditions prevents, at present, its collection within a set of CDE, although they are central in the interest of clinicians and in the scope of many registries. The EPIRARE project suggested that the European RDR Platform could host a section of metadata of the clinical observations collected by individual registries, in order to facilitate traceability of existing data and contacts with registries collecting relevant data.

Conclusions

The definition of a set of CDE for the European RDR Platform has different bearings for the databases of registries in comparison to the database in the European RDR Platform. For registries, this set of CDE is not to be considered as the fixed structure of a common database to be used by all registries regardless of their purposes. Rather, it intends to provide “building blocks” for the construction of registries for a variety of purposes. Therefore a registry should select, beside the mandatory set (domain 1 data), the data elements, which are necessary to compute the indicators relevant for the purposes it intends to pursue, and collect the corresponding data according to the definitions and formats proposed. Moreover, in case that the registry intends to collect outcome data, it is recommended that all the data indicated in domain 3 are collected. Finally, it is up to the registry the choice to collect additional data, not included in the set of CDE, for more detailed or specialized observations which are necessary for its own specific study purpose, such as treatment-specific features or disease-specific clinical data. Therefore the adoption of the European RDR Platform CDE has the main aim to promote the collection, according to common specifications, of data necessary to compute indicators which are both relevant to the purpose of the registry and key for more general purposes regarding RD, the achievement of which may require indicator and data comparability. The actual practice of collection of this data according to the specifications proposed by EPIRARE, the feasibility of adaptation to the proposed specification and the further usability of data already collected has been studied and is the subject of a manuscript in preparation. Moreover, this practice will contribute, in case that this will be allowed by the next regulation on data protection, to the interoperability and data merging among different registries. Within a scenario of feasible data sharing, the European RDR Platform could accommodate and use the relevant data communicated by registries for the computation, as far as feasible, of indicator values from a wider evidence base, or to support the collection of data tailored to the specific features of many different diseases. For these aims, its database should necessarily consist of the full set of CDE and, likely, of additional metadata to facilitate traceability of existing data and contacts with the sources of data, including more detailed or specific observations. The definition of a set of CDE for the European RDR Platform is the first step in the promotion of the use of common tools for the collection of comparable data of RD patients. The next step in this process is the definition of common references for those data which can be entered following different coding systems, catalogues or measuring scales. The standards and terminologies to be used in the platform should be agreed with clinical and epidemiological experts and, possibly, involving representatives of EU national information systems.