European Commission and Patients Associations identify Registries as strategic instruments to improve knowledge in the field of Rare Diseases [1, 2]. Interoperability between Rare Diseases Patient Registries (RDPR) is especially needed to support research activities, to validate therapeutic treatments and to plan public health actions. Because of the extreme variety of RDPR, a uniform and standardized way of collecting data and the identification of specific levels of connection between RDPR with similar aims is needed.

In this study, exploratory data analyses were applied to the EPIRARE (European Platform for Rare Diseases Registries) Registry Survey in order to generate a macro-classification and characterization of RDPR and to deepen different informative needs.

At first, a Multiple Correspondence Analysis (MCA) suggested associations between selected variables characterizing the structure of RDPR (Figure 1). Then, a Cluster analysis (CA) was developed using the declared “Aims” of each RDPR. CA confirmed the variable associations emerged by MCA and identified three groups defined as: Public Health (PHR), Clinical-Genetic Research (CGRR), and Treatment Registries (TR). Finally, the random forest (RF) method was applied to the Survey data, leading to six classification models endowed of good predictive power and thus confirming the reliability of considering three groups of RDPR. RF also identified several informative variables which allowed the characterization of the three categories of RDPR, defined by data of different nature and by different levels of diffusion (Table 1).

Figure 1
figure 1

Factorial plan by MCA.

Table 1 Main characteristics of Clinical-Genetic Research, Treatment, and Public Health Registries according to the most informative variables emerged after the random forest method. Variables reported in the table characterize most of the registries of each class.

These results, identifying different profiles of RDPR and specific informative needs, represent an informative support aimed at addressing the activities for the design of an European platform of Rare Diseases. Identification of informative cores could address the activities of a platform able to enhance the sharing of information between RDPR with common aims, but also to facilitate a coherent dialogue between RDPR with different profiles.

Guide to interpretation: the arrows indicate the directions of association among the aims; the dimension of the circles represents the frequency of the variable. The higher are the coordinate and the frequency of the variable, the more it contributes to the interpretation of the factorial axis; variables placed on the same direction are correlated.