FormalPara Key Points

Pharmaceutical companies are expected to monitor the safety of their products using external regulatory databases.

For the majority of drugs, there is no significant difference between the number and types of safety signals that arise from the three main data sources: EudraVigilance Data Analysis System (EVDAS), Food and Drug Administration (FDA) Adverse Event Reporting System (FAERS), and VigiBase®.

1 Introduction

One of the goals of the Transcelerate “Advancing Safety Analytics” collaboration is to develop and share best practices in pharmacovigilance, including those for the adoption of the European Medicines Agency’s (EMA’s) EudraVigilance Data Analysis System (EVDAS) in relation to the other regulatory databases.

Pharmacovigilance of medicinal products relies on statistical signal detection in various data sources. All Marketing Authorization Holders (MAHs) routinely inspect their company safety databases for higher than expected occurrences of drug–event combinations (DECs), usually taking a hybrid approach of manual medical review of adverse event reports of interest and disproportionally reported DECs. In addition to the company databases, three large regulatory databases exist that allow statistical analysis of adverse events. Many organizations may monitor one or more of these data sources for signals of disproportionate reporting (SDRs), for a presumably more robust background compared to their internal safety database. Independent from their utility in raising safety hypotheses, SDRs can characterize the reporting profile of a drug, and support a comparison of product characteristics across databases, by accounting for the distribution of all drugs and all events in the database.

The EMA maintains the EudraVigilance system, managing and analyzing information on suspected adverse reactions to medicines which have been authorized or are being studied in clinical trials in the European Economic Area (EEA). As of present, MAHs for selected substances included in the EMA signal management pilot (transitional arrangements for legal requirement) are required to routinely monitor EVDAS. FAERS is the Food and Drug Administration (FDA) Adverse Event Reporting System, a database that contains adverse event and medication error reports submitted to the FDA. Finally, VigiBase® is the World Health Organization’s (WHO’s) global database of individual case safety reports (ICSRs) maintained by the Uppsala Monitoring Centre, Uppsala, Sweden, containing ICSRs submitted since 1968 by member countries of the WHO Programme for International Drug Monitoring. Between those databases, operational characteristics vary for mode of access, frequency of updates (daily updates in EVDAS, quarterly updates in FAERS and VigiBase®), and detail of accessible information.

Among the three databases, we have focused on EVDAS for its most recent availability, and for its legally required monitoring, as opposed to FAERS and VigiBase®, where monitoring is at the discretion of the MAH.

It is well-known that the positive predictive value of disproportionality methods in pharmacovigilance is low [1]. As a result, inspection and triage of the SDRs resulting from one regulatory database (in addition to the company’s internal safety system) can consume significant resources.

Past investigations have focused on comparing different algorithms and their performance in predicting a reference set of actual adverse drug reactions (ADRs) [1], whereas this study fixes the methods and compares the resulting SDRs.

In a survey conducted within this Transcelerate collaboration, it was shown that, since the advent of the EVDAS monitoring requirement, most MAHs are in a holding pattern (continue to monitor or not monitor FAERS and/or VigiBase®). However, the basis for these decisions is not supported by a robust quantitative analysis of the type and number of signals detected in these three sources. In addition, the incremental value of adding a systematic screening of EVDAS to existing processes in signal management has recently been challenged [2].

Statistical signal detection is generally conducted at the MedDRA® Preferred Term (PT) level, whereas signal evaluation is conducted at the level of the medical concept, such as the MedDRA® High Level Term (HLT) or Standardized MedDRA® Query (SMQ) level. To that end, the redundancy between the SDRs in EVDAS, FAERS, and VigiBase® was measured at different levels of the MedDRA® hierarchy. In addition, product characteristics that could impact the level of redundancy between the three databases were analyzed.

2 Methods

One hundred substances were selected for this study, considering the volume of prescriptions in the USA (i.e., based on typically some market maturity) [3], a representation of orphan drugs, and the presence of the substance in the EVDAS, FAERS, and VigiBase® databases. A curated list of 231 substances was used to randomly select 100 of those drugs; those drugs with their cumulative case counts in the three data sources are shown in the Electronic Supplementary Material (ESM) (see Appendices A and B).

Source data were retrieved on the drug–event level, including the case count associated with the event (coded on the level of the MedDRA® PT), and selected measures of disproportionality, using the following sources and cut-off dates:

  • EVDAS Electronic Reaction Monitoring Reports (eRMR) for a fixed reference period through 31 July 2019, retrieved through the MAH Pharmacovigilance Queries dashboard [4]

  • FAERS and VigiBase® Oracle® Health Sciences Empirica Signal, with 2018Q4 (FAERS) and 2019Q1 (VigiBase®) cut-off dates

Any DEC in the source data was screened for SDRs, using two sets of filter criteria: scenario 1 and scenario 2.

Scenario 1 applies a consistent set of thresholds, aligned with the EMA recommendations for screening for adverse reactions in EudraVigilance [5]. To qualify as an SDR, the DEC must meet all the following criteria:

  • Lower boundary of the 95% confidence interval of the reporting odds ratio (ROR02.5) > 1.

  • Five or more cases.

  • Adverse event is an EMA important medical event (IME).

Scenario 2 uses the pre-computed SDR flag from the eRMR (which implies an IME restriction) and a commonly accepted threshold for Bayesian disproportionality, combined with the IME quality of the PT. To qualify as an SDR, the DEC must meet the following criteria:

  • EVDAS SDR All = yes. The following criteria are met in at least one of the eRMR regions: (1) adverse event is an IME; (2) ROR02.5 > 1; (3) total number of spontaneous (excluding litigation) cases is at least three for substances under additional monitoring or at least five for all other substances [6].

  • FAERS and VigiBase® lower boundary of the 90% confidence interval of the empirical Bayesian geometric mean (EB05) ≥ 2 and the adverse event is an IME.

Table 1 shows the SDR criteria for both scenarios and the three data sources.

Table 1 SDR criteria for both scenarios and the three data sources

The combined results are extracted as a list of distinct PTs that are present as an SDR in at least one of the databases. Each PT in the combined results is evaluated against the result of the individual database. SDRs are identified on an individual PT basis, as well as for clinically related PTs (adverse events that share an HLT, or an HLT or SMQ):

  • PT level SDR only if exact PT match in this database; otherwise no SDR.

  • HLT level SDR if PT shares a HLT with any PT identified as an SDR for this database; otherwise no SDR.

  • HLT and SMQ level SDR if PT shares an HLT, or at least one narrow-scope SMQ, with any PT identified as an SDR for this database; otherwise no SDR.

A practical example for the matching of adverse events on the three levels is provided in Table 2, for insulin aspart (also illustrated with Fig. 7).

Table 2 Insulin aspart: matching adverse events on PT, HLT, and HLT or SMQ level

Two performance metrics are computed from the combined results, illustrating the degree of similarity, with equal contribution from each PT:

  • EVDAS recall A measure of completeness (how many of the combined SDRs are identified from EVDAS). It is computed as the number of SDRs identified in EVDAS divided by the number of SDRs identified in any database (combined results).

  • EVDAS overlap A measure of redundancy (how many of the EVDAS SDRs are also identified in FAERS and/or VigiBase®). It is computed as the number of SDRs identified in EVDAS and at least one other database divided by the number of SDRs identified in EVDAS.

Figure 1 presents the concepts of recall and overlap for insulin aspart (also illustrated with Fig. 7).

Fig. 1
figure 1

Insulin aspart: example of the recall and overlap concepts (PT level, scenario 1). EVDAS EudraVigilance Data Analysis System, FAERS FDA Adverse Event Reporting System, FDA Food and Drug Administration, PT Preferred Term in MedDRA®

Results are also illustrated with circle pack layouts (CPLs), accounting for differences in case counts among PTs, which may in turn reflect the public health impact of an observation (when variation in reporting likelihood is disregarded). CPLs encode information into the size and color of circles, efficiently packed in a constant space. Each item represents one PT in the combined results, reflecting the case count of the DEC totaled across all databases (size), and association with either database (color). Items appear in descending order by case count, from the center of each chart to its periphery.

Item color indicates:

  • Blue (four-step color scale) PT is an SDR in EVDAS only (1), in EVDAS and FAERS only (2), in EVDAS and VigiBase® only (3), or in EVDAS and FAERS and VigiBase® (4).

  • Red (three-step color scale) PT is an SDR in FAERS only (1), in VigiBase® only (2), or in FAERS and VigiBase® only (3) (no SDR in EVDAS).

MedDRA® 22.0 is used for coding and for the medical grouping of terms of interest. MedDRA® HLT relationships are based on the primary-path associations. MedDRA® SMQ relationships are based on the narrow scope of the SMQ. IMEs are identified per the EMA IME definition as published on their homepage [7].

Source data were extracted in spreadsheet format, with analytical processing in Hypertext Markup Language (HTML) with JavaScript, including the D3js (data driven documents) library, and individual results were rendered in the Chrome browser [8]. Results per substance are aggregated with R 3.5.1 (R Core Team 2018) [9].

In a preliminary analysis of approximately 30 substances, the majority of the substances demonstrated an overlap of 80–90%. Due to the sample size being limited to approximately 30 substances, the confidence interval around the median overlap estimate was wide. In order to determine a more precise estimate, we evaluated the confidence intervals around a median overlap of 80% in sample sizes ranging from 50 to 500 substances. We assumed we may observe a similar median overlap as in the preliminary analysis. The expected width of the confidence interval for 100 substances is approximately 16% (0.72–0.88) for an overlap of 80%. We determined that this sample size of 100 substances provided sufficient precision for this analysis.

3 Results

Figures 2, 3, and 4 provide an overview of the percentage of overlap and recall between EVDAS and FAERS/VigiBase® data sources for scenario 1, i.e., application of identical signal detection methods in all three databases. Table 3 summarizes the descriptive statistics. Appendix A in the ESM lists all values of overlap/recall for the individual products.

Fig. 2
figure 2

Recall and overlap of EVDAS under scenario 1 at the PT level. EVDAS EudraVigilance Data Analysis System, PT Preferred Term in MedDRA®

Fig. 3
figure 3

Recall and overlap of EVDAS under scenario 1 at the HLT level. EVDAS EudraVigilance Data Analysis System, HLT high level terms

Fig. 4
figure 4

Recall and overlap of EVDAS under scenario 1 at the HLT or SMQ level. EVDAS EudraVigilance Data Analysis System, HLT high level terms, SMQ Standardized MedDRA® Query

Table 3 Summary of recall and overlap for scenario 1 for all 100 substances

Both overlap and recall increase with the level of the MedDRA® hierarchy (note, while we are not suggesting that signal detection is done at HLT or SMQ level, medical review of a PT-level safety signal would normally find related medical concepts, i.e., at the HLT and/or SMQ level). At the HLT or SMQ level overlap and recall indicate little differentiation between the signal information generated from each of the data sources. Distributions are skewed left, with skewness more pronounced at the higher MedDRA® levels. This implies that most products have high overlap/recall values, with the exception of a small number of outliers (e.g., see Fig. 4, where this is most prominent). Figures 2, 3, 4, 8, 9, 10, 11, 12, and 13 reflect the results for all 100 substances.

In order to illustrate how the signals in EVDAS are coincident with signals from the FAERS and VigiBase® sources, three examples are presented.

The first example (Fig. 5) shows fluticasone and salmeterol, a prescription medication indicated for use in asthma and chronic obstructive pulmonary disease, with almost identical signals emerging from the various data sources. At the HLT or SMQ level (where similar terms are often grouped for medical review), both overlap and recall approach 100% (i.e., recall of 97.9% and overlap of 94.2%), indicating an interchangeable representation of the medical concepts.

Fig. 5
figure 5

Overlap and recall for fluticasone and salmeterol, matching on the HLT-SMQ level and using scenario 1. EVDAS EudraVigilance Data Analysis System, HLT high level term, SMQ Standardized MedDRA® Query

Nearly all signals found in EVDAS are also found in FAERS/VigiBase® (high overlap). This can be inspected in the CPL through the presence of very few light-blue items, which represent the EVDAS-only signals

Very few signals are unique to EVDAS (high recall). This can be inspected in the CPL through the near absence of red items, which represent signals not found in EVDAS.

A second example is shown in Fig. 6. This CPL shows benzonatate (a cough medicine), which has 100% overlap and low recall (45.8% at the HLT or SMQ level). The CPL plot shows that all EVDAS signals are also found in FAERS/VigiBase® (100% overlap), but there are a significant number of FAERS/VigiBase® signals not found in EVDAS, represented by the red items.

Fig. 6
figure 6

Overlap and recall for benzonatate, matching on the HLT-SMQ level and using scenario 1. EVDAS EudraVigilance Data Analysis System, HLT high level term, SMQ Standardized MedDRA® Query

Finally, Fig. 7 shows a “typical” substance, i.e., a substance for which the recall is close to the median recall of 87.9% and the overlap is close to the median overlap of 97.7% at the HLT or SMQ level. One such “typical” substance is insulin aspart, with HLT or SMQ recall of 89.1% and HLT overlap of 98.0%.

Fig. 7
figure 7

Overlap and recall for insulin aspart, matching on the HLT-SMQ level and using scenario 1. EVDAS EudraVigilance Data Analysis System, HLT high level term, SMQ Standardized MedDRA® Query

As can be seen, there are a small number of medical concepts unique to EVDAS, usually reflecting small case counts (the item sizes are small), while very few signals are present in FAERS/VigiBase® that are not captured in EVDAS (red items). These “missing” signals are also reflecting smaller case counts.

Figures 8, 9, and 10 provide an overview of the percentage of overlap and recall between EVDAS and FAERS/VigiBase® data sources for scenario 2, i.e., application of the “standard” signal detection methods in the respective three databases. Table 4 summarizes the descriptive statistics. Appendix B in the ESM lists all values of overlap/recall for the individual products.

Fig. 8
figure 8

Recall and overlap of EVDAS under scenario 2 at the PT level. EVDAS EudraVigilance Data Analysis System, PT Preferred Term in MedDRA®

Fig. 9
figure 9

Recall and overlap of EVDAS under scenario 2 at the HLT level. EVDAS EudraVigilance Data Analysis System, HLT high level term

Fig. 10
figure 10

Recall and overlap of EVDAS under scenario 2 at the HLT or SMQ level. EVDAS EudraVigilance Data Analysis System, HLT high level term, SMQ Standardized MedDRA® Query

Table 4 Summary of recall and overlap for scenario 2 for all 100 substances

Comparison of Tables 2 and 3 shows uniformly higher values for recall under scenario 2 relative to scenario 1, while the overlap values are consistently lower.

3.1 Regression Analysis

In order to characterize beyond the individual product level which product characteristics are affecting the level of overlap and recall, several covariates were investigated at the HLT level, including time-on-market (ToM), the difference in time-on-market between first approval in the EU versus the USA (Diff ToM), the number of EudraVigilance cases reported in the EU as a proportion of the total number of spontaneous cases reported in EudraVigilance for a product (EVDAS EU proportion), and the ratio of the number of cases reported in EudraVigilance versus FAERS (EVDAS FAERS ratio). Simple linear regression models were fitted to the data. Table 5 summarizes the results.

Table 5 Summary of results of overlap and recall linear regression models

No specific covariates can be identified that systematically affect the EVDAS recall. The two variables that partially determine the overlap are the relative number of EU cases in EudraVigilance and the ratio of EVDAS cases and FAERS cases, presumably due to the differences in marketing authorizations, or market penetration in different regions. Overlap increases with those two ratios. See Fig. 11 for the fitted line graph.

Fig. 11
figure 11

EVDAS HLT overlap as a function of recall of EVDAS EU proportion and EVDAS FAERS ratio. EVDAS EudraVigilance Data Analysis System, EVDAS EU proportion number of EudraVigilance cases reported in the EU as a proportion of the total number of spontaneous cases reported in EudraVigilance for a product, EVDAS FAERS ratio the ratio of the number of cases reported in EudraVigilance vs FAERS, FAERS FDA Adverse Event Reporting System, FDA Food and Drug Administration, HLT High Level Term

Note that, not unexpectedly, the two variables EVDAS EU proportion and EVDAS FAERS ratio are correlated, as shown in Fig. 12 (P value = 0.000).

Fig. 12
figure 12

EVDAS EU proportion as a function of the EVDAS FAERS ratio. EVDAS EudraVigilance Data Analysis System, EVDAS EU proportion number of EudraVigilance cases reported in the EU as a proportion of the total number of spontaneous cases reported in EudraVigilance for a product, EVDAS FAERS ratio the ratio of the number of cases reported in EudraVigilance vs FAERS, FAERS FDA Adverse Event Reporting System, FDA Food and Drug Administration

4 Discussion

Based on SDRs, we have compared the similarity of reporting characteristics for 100 selected substances in three large regulatory databases, to provide the quantitative evidential basis for the unique value attributed to an individual database, and to provide suggestions for making an informed decision about the screening strategy for individual products.

As can be seen in Figs. 2, 3, and 4, most products display a very high degree of overlap (> 80%) between EVDAS and FAERS/VigiBase®, with most products showing well over 90% overlap at the higher levels in the MedDRA® hierarchy. The significance of this is the fact that most signals generated in EVDAS would be seen in FAERS/VigiBase® as well, assuming the same methodology. Conversely, the recall of potential signals in EVDAS alone relative to the total number of signals in all three data sources also approaches 90% for the “average” product at the HLT or SMQ level. At the PT level, certain signals may not be found in EVDAS that are found in FAERs/VigiBase®. The normal assessment approach for signals is to extend the medical review to related concepts, such as those found in the HLT and/or SMQ of the PT, and therefore, the recall analyses at those higher levels imply that few unique database-specific signals exist at those hierarchy levels relevant to clinical assessment.

A typical product, as shown in Fig. 7 for insulin aspart, would yield very similar signals in EVDAS to those it generates in the other two databases, with near perfect agreement at the higher levels of the MedDRA® hierarchy. Some exceptions to this rule exist, the most obvious example being benzonatate, shown in Fig. 6. The recall of signals in EVDAS is only 29.2% at the PT level (note, both PT- and HLT-level results are available in the ESM). Upon inspection of the eRMR for benzonatate, it was found that only 627 cases are present, distributed over 392 PTs. It is not surprising that 88% of those PTs are reported in two or fewer cases. So, the limited number of reports in EVDAS (relative to FAERs, which for this product has 2267 cases) may result in fewer signals being detected. In fact, upon inspecting the four products with the lowest EVDAS case counts, it was seen that these products have the lowest recall (< 60%) at the HLT or SMQ level (e.g., Fig. 13). The four products that display an overlap of < 60% have 1506 or fewer cases, with the lowest overlap product only containing 213 cases in EVDAS. These four products correspond to the encircled dots in Fig. 13.

Fig. 13
figure 13

EVDAS recall as a function of total spontaneous cases in EVDAS, with the four products with the lowest EVDAS case counts circled. EVDAS EudraVigilance Data Analysis System, HLT high level terms, SMQ Standardized MedDRA® Query

Interestingly, the outlier to the far right of Fig. 13 is levothyroxine. In VigiBase®, 50% of the cases are accounted for by only 28 out of 4897 MedDRA® PTs, compared to a more typical example where the distribution of cases is less skewed (e.g., for metoprolol, 50% of the cases are spread over 78 out of 4348 PTs). The skewed case distribution for levothyroxine is conceivably, at least in part, due to stimulated reporting following a reformulation of the product that received negative media coverage in France [10].

The univariate analyses do not present any strong pattern explaining overlap between EVDAS and FAERS/VigiBase®, with the exception of the relative case count in EVDAS versus FAERS and the similar measure of “relative number of EU cases to total cases in EVDAS”, with the overlap increasing for higher ratios. Even for products with similar EVDAS versus FAERS ratios, much of the variability is not fully explained, and while the vast majority of products cluster around the median overlap value of 90% at the HLT level, some excursions to lower overlap values are observed even when the EU ratio of cases is large (e.g., 50%). The recall of EVDAS signals seems to significantly drop off if total spontaneous case counts drop below 1500, while no other variable investigated has been shown to have any effect on the overlap.

When the semi-standard Bayesian methods were applied to FAERS/VigiBase®, while the SDR(All) approach was used in EVDAS for the same product, as expected, the overlap went down, while recall in EVDAS increased, consistent with the lower specificity and higher sensitivity of the frequentist methodology. Evens so, at the higher MedDRA® hierarchy, recall and overlap approach 95% and 85%, respectively, demonstrating that the method used is not quite as impactful at those levels of signal detection.

We found that for a few products, the choice of method does have a significant impact on the number of SDRs found in EVDAS. As an example, prednisolone yields 846 SDRs when applying the ROR(-) method and 618 SDRs when using the SDR(All) column in the eRMR spreadsheet. Other products display an opposite behavior with the ROR(-), yielding fewer signals then SDR(All). One example is ciprofloxacin, which shows 182 versus 275 SDRs using ROR(-) and SDR(All), respectively. The distribution of the different number of SDRs [defined as ROR(-) SDRs minus the SDR(All) SDRs] is shown in Fig. 14.

Fig. 14
figure 14

Difference in SDR counts at the PT level between ROR(-) method and SDR(All) method. PT Preferred Term in MedDRA®, ROR reporting odds ratio, SDR signal of disproportionate reporting

We did not assess the novelty, credibility, or medical urgency of the PTs (or medical concepts) that differed among the individual databases. Additional results do not necessarily imply a need for further action, as they may be due to the established safety profile of the drug, be covered in ongoing regulatory procedures, or do not justify verificatory actions for other reasons. The differences quantified in this study can be interpreted as an upper boundary of potentially relevant observations.

5 Conclusion

For a broad cross-section of medicinal products, disproportionality analysis in the three main regulatory databases generally yields very similar SDRs. For many substances, the extent to which potentially relevant observations are unique to one database is quantifiably low. However, we have identified a few exceptions to this general pattern, with no complete explanation among the covariates that we examined. Operational considerations include the mode of database access, frequency of updates (ranging from daily to quarterly), and detail of accessible information. Legal requirements for monitoring vary, as does the approach among MAHs for those databases where monitoring is optional. We have not investigated the medical relevance of differences in results, and we expect that the likelihood of justifying verificatory action and the potential to contribute important new safety insights will vary. When selecting databases for signal detection, organizations do typically consider regulatory expectations, operating performance (like positive predictive value), and procedural complexity. As SDRs can be seen as a proxy of general reporting characteristics identifiable in a systematic screening process, our results indicate that, for most products, these characteristics are largely similar in each of the databases.