FormalPara Key Points

Overall, a spontaneous reporting system (SRS) is better suited to detection of signals than an electronic health record (EHR)-based system, especially for certain types of reactions (rare events and those with a high drug-attributable risk).

Use of EHRs might be justifiable in some situations where SRSs perform poorly (e.g. outcomes with a high background incidence), provided that the additional costs can be taken into account.

SRSs and EHR-based signal detection systems can be complementary, the additional value of one to the other varying across events, as a function of the background incidence of the event.

1 Introduction

Spontaneous reporting systems (SRSs) have long been the foundation of post-marketing drug safety surveillance [1]. Despite their broad utilization, information found in such systems is sometimes limited [2] and, as a consequence, decisions based solely on data from these systems often need to take into account significant uncertainties [3]. In many instances, it is not sufficient to use a single source of information to understand a particular drug safety issue [4], and there is a recognized need to put together most, if not all, available relevant sources in an efficient way.

Following the drug safety concerns surfacing between 2004 and 2007, many discussions started as to whether we can have a more pro-active approach to signal detection instead of relying on passive surveillance systems. In both Europe and the USA, it was explored whether electronic healthcare record (EHR) databases, which comprise detailed data collected longitudinally and routinely in actual care for large-scale populations [5], may be used for post-marketing safety surveillance. EHRs have been primarily used for signal evaluation studies; however, in recent years, various projects have explored ways of using them as an additional source for signal detection systems, e.g. the Observational Medical Outcomes Partnership (OMOP) [6, 7], Pharmacoepidemiological Research on Outcomes of Therapeutics by a European Consortium (PROTECT) [8] and exploring and understanding adverse drug reactions (EU-ADR) [9, 10].

To date, only two studies [11, 12] have tried to combine both sources in order to support the signal detection process, while the majority of the available research has focused rather on comparing the two systems in terms of overall performance and usefulness [1316].

The aim of this study was to investigate in which particular situations EHR-based signal detection systems may add value to already existing SRSs, focusing on a limited set of adverse events of considerable importance in pharmacovigilance. To express this added value, we used performance indicators, including percentages of unilaterally identified signals and sensitivity to describe the ‘gains’, as well as the number needed to detect (NND), for the ‘costs’ associated with signal detection.

2 Methods

2.1 Design

A signal detection strategy focused on a limited set of adverse events was performed retrospectively in two database systems—(1) EU-ADR and (2) EudraVigilance—from 1 January 2000 to 1 January 2010. These are described separately below. The two systems were considered individually, and the most sound event definitions possible and implementation of signal detection methods were taken into account in each database independently in order to optimize the performance of each one. In this study, we used the term ‘signal’ to refer to a signal of disproportionate reporting (SDR), as defined by the Council for International Organizations of Medical Sciences (CIOMS) Working Group VIII [1], in the context of signal detection in EudraVigilance and equivalent to a statistically significant drug–adverse event association that met a specific threshold of increased risk in the context of EU-ADR. All drugs captured in either of the two systems were considered. Drugs not identifiable at the fifth level of the WHO Anatomical Therapeutic Chemical (ATC) classification system, as well as herbal supplements, were excluded.

2.2 Events of Interest

We considered the following five events, selected from a list of events previously identified as important on the basis of expert judgment and predefined criteria [17]: (1) acute myocardial infarction (AMI), (2) bullous eruption, (3) hip fracture, (4) acute pancreatitis and (5) upper gastrointestinal bleeding (UGIB). These events were chosen because of their diversity in aetiology, background incidence and drug-attributable risk attributes, which we considered might have an impact on the performance of the two systems. To investigate the possible correlation between the signal detection performance of each system and the frequency of the events, we ranked our events of interest according to the empirically determined background frequency (i.e. the incidence rate in the general population). These incidence rates were derived from the EU-ADR network in order to maintain the same base population across events, which allowed for a more reasonable comparison [9].

2.3 The SRS: EudraVigilance

As the exemplar for an SRS, we used EudraVigilance, a web-based information system launched in December 2001 and designed to manage information on suspected adverse drug reactions (ADRs) that are reported for drugs licensed in Europe. The total number of individual reports as of December 2013 was 4.5 million, with 38 % of cases originating from the European Union and 62 % from the rest of the world [18].

2.3.1 Capturing Events of Interest

In EudraVigilance, suspected ADRs are coded using the Medical Dictionary for Regulatory Activities (MedDRA®), an international medical terminology system developed under the auspices of the International Conference on Harmonisation (ICH) [19]. For capturing the events of interest, we used adapted searches derived ad hoc from standardized MedDRA® queries (SMQs), [20] similar to the approach used by Patadia et al. [16].

2.3.2 Method of Signal Detection

Signal detection in EudraVigilance was performed using the proportional reporting ratio (PRR) method [21], previously validated in this database by Alvarez et al. [22]. Only cases received within the study period (from 1 January 2000 to 1 January 2010) were considered for identification of signals.

The threshold chosen to define a signal was a lower limit of the confidence interval of the PRR greater than 1, together with at least three cases being reported with the investigated association [23]. No further adjustment was done for possible confounding variables.

2.4 The EHR-Based System: EU-ADR

As the exemplar for an EHR-based system, we used EU-ADR, a computerized system designed to detect potential ADRs and built on a network of established databases from various European countries [9]. Data from seven databases in three countries (Denmark, Italy and The Netherlands) were used in this study. EU-ADR includes both population-based primary care databases (the Integrated Primary Care Information [IPCI] database [The Netherlands] and Health Search/CSD Patient and Pedianet [Italy]) and record-linkage systems (the Aarhus University Hospital Database [Denmark], the PHARMO Network [The Netherlands] and the regional Italian claims databases of Lombardy and Tuscany). The source population covered by the database network is approximately 20 million patients. Drug exposure in EU-ADR was identified from prescription or dispensing data (depending on the database), using ATC codes. Prescriptions with the same ATC code where the start date of one prescription preceded the end date of the other prescription were merged into a single episode of drug use, starting at the beginning of the first prescription and ending at the end of the last prescription. Periods of concomitant drug use were labelled as separate episodes. Only current exposure (within 30 days of an event of interest) was considered [9]. The characteristics of the EU-ADR network have been extensively described elsewhere [9, 24].

2.4.1 Capturing Events of Interest

Definitions for each event of interest were previously constructed by a team of experts and, on the basis of those, queries were performed in each database in the network, using the corresponding diagnosis coding schemes—the International Classification of Diseases (ICD) 9th and 10th revisions, and the International Classification of Primary Care—supplemented with additional criteria such as laboratory values and unstructured free-text searches, where applicable. The results were subsequently pooled across all databases [25]. The events AMI and UGIB had previously been validated in the databases concerned [26, 27].

2.4.2 Method of Signal Detection

For EU-ADR, a signal detection method specifically developed for EHR data was used: the Longitudinal Gamma Poisson Shrinker (LGPS) [28]. LGPS is a cohort-based method, adapted from a Bayesian method (DuMouchel’s Gamma Poisson Shrinker, designed for use in an SRS [see the Electronic Supplementary Material]), which uses person-time rather than case counts for estimation of the expected number of events. Previous evaluation against other signal detection methodologies showed that LGPS is the best-performing method in this database system [29]. We applied a threshold to the LGPS risk ratio (RRLGPS) of a lower limit of the 95 % credible interval >1 [28]. After LGPS, we applied a second method, Longitudinal Evaluation of Observational Profiles of Adverse events Related to Drugs (LEOPARD), which adjusts for possible protopathic bias and improves performance. LEOPARD is based on comparison of rates of drug prescriptions initiated within a fixed time window (±25 days) prior to and after the occurrence of an event, on the basis of the assumption that an increase in the number of prescriptions started after an event relative to the number of prescriptions started prior to the event is an indication of protopathic bias. From a statistical perspective, this is a binomial test, which compares the distributions of prescriptions across those two time windows [28].

2.5 Verification of Signals

We reviewed the currently available literature in order to determine which drug–adverse event associations identified in the dataset represented already known associations. In contrast to the approach used in previous studies, verification was performed for all drug–event associations, irrespective of whether a potential signal was flagged or not by any of the signal detection methods used in either EudraVigilance or EU-ADR.

An automatic tool developed within EU-ADR, which searches Medline-indexed publications concerning adverse drug reactions [30, 31], was used to qualify the drug–event associations as ‘ADRs’ (i.e. already known to be true) or ‘non-ADRs’. For each drug–event association, Medline citations with co-occurrence of the drug and the adverse event of interest were extracted and manually reviewed by two independent evaluators with experience in pharmacovigilance and pharmacoepidemiology. For the list of ADRs, we considered only those with at least three independent PubMed citations that showed a potentially causal association. The lowest level of evidence accepted was three case reports mentioning the occurrence of a specific adverse event in individual patients exposed to the specific drug. Discrepancies in the assessment were resolved through discussion. All associations for which not enough evidence was found in the literature (i.e. <3 confirmatory articles) were considered ‘non-ADRs’. For these, a random sample of 5 % of drug–event associations for each of the five events of interest was manually reviewed by the two evaluators. In addition, if at least two drugs belonging to the same therapeutic class (defined by having in common the first five digits of their ATC codes) had a positive association with a certain event, a class effect was suspected and an additional manual review of the summary of product characteristics (SmPC) was done to see if other drugs in the class were also associated with that specific event. Figure 1 shows a schematic workflow of the verification process.

Fig. 1
figure 1

Workflow of verification of signals. *A 5 % random sample was manually checked to see if the classification as non–adverse drug reactions (non-ADRs) was accurate

For the purposes of this study, drug–event associations that were suggested to be ADRs according to the criteria described above were assumed to be ADRs, otherwise these associations were assumed to be non-ADRs.

2.6 Performance Indicators

In order to assess the complementarity of the systems and to calculate the costs associated with identification of potential signals from different sources, we used the following indicators:

Percentage of unilaterally identified signals is a variation of sensitivity (recall) metrics, which uses as the numerator the number of true associations identified in one system that were not identified in the other. We considered this variable to be useful in quantifying the incremental value of each system.

Sensitivity was calculated as the proportion of associations correctly identified by the method out of the total pool of ADRs known to be true from the literature [32].

Number needed to detect (NND), originally described by Hauben et al. [33] in the context of signal detection within an SRS, was used as a proxy to express the costs associated with each source of signals. This represents the number of signals that would have to be reviewed to detect a single signal that was proven to be true.

Spearman’s correlation coefficient (R value) was calculated to determine whether there was a correlation between the background incidence of the events and each system’s capability to detect signals.

3 Results

From an initial dataset of 5536 drug–event associations overall, 1490 potential signals (27 %) were flagged in either EudraVigilance or EU-ADR (Fig. 2). Upon signal verification, the ratio of ADRs to non-ADRs varied from 1:6 for acute pancreatitis to 1:19 for hip fracture.

Fig. 2
figure 2

Distribution of potential signals in the EudraVigilance and (EU-ADR) databases, grouped according to the anatomical therapeutic chemical (ATC) classification therapeutic subgroup (note: only classes with >30 potential signals are shown)

The therapeutic classes comprising the majority of potential signals identified in EudraVigilance were agents acting on the renin–angiotensin system, antivirals for systemic use and antithrombotic agents, while for EU-ADR they were anti-asthmatics, psychoanaleptics and antiepileptics (see Fig. 2). The percentages of potential signals identified in both systems ranged from 2 to 24 %.

The median sensitivity for detecting signals across all events in EudraVigilance was 42 % (range 20–71 %) and for EU-ADR it was 27 % (range 23–60 %), with the values depending on the event of interest (see Fig. 3). AMI was the hardest to detect among all five events, with 65 % of known AMI associations from the literature not being flagged in either database system. Hip fracture and bullous eruption seemed to be the easiest to identify overall, with only 21 and 28 % of known associations remaining undetected, respectively. From a system perspective, the most easily identified events in the SRS were bullous eruption, acute pancreatitis and UGIB, while in the EHR they were hip fracture, UGIB and AMI.

Fig. 3
figure 3

Contribution of each system to signal detection (i.e. percentage of ‘positive’ associations detected out of the total adverse drug reaction [ADR] pool in the literature). ‘N’ indicates the total number of true associations in the dataset, and ‘found in neither’ indicates that the association was not highlighted as a signal in any of the databases during the signal detection process. AMI acute myocardial infarction, BE bullous eruption, EU-ADR exploring and understanding adverse drug reactions, FRACT fracture, PANC pancreatitis, UGIB upper gastrointestinal bleeding

The background incidence of the events, obtained from EU-ADR, was plotted against the percentage of unilaterally identified signals. The contribution of each database to signal detection appeared to be correlated with the background incidence of the events, being positively although non-significantly correlated in EU-ADR (R = 0.7, P = 0.18) and inversely and significantly correlated in EudraVigilance (R = −1, P < 0.01) (see Fig. 4).

Fig. 4
figure 4

Correlation between the background incidence of events and the contribution of each system to signal detection. The background incidences of the events (per 100,000 person-years), estimated from exploring and understanding adverse drug reactions (EU-ADR) data, pooled across all databases, were bullous eruption = 4.2, pancreatitis = 21.4, upper gastrointestinal bleeding = 82.2, hip fracture = 117.7 and acute myocardial infarction = 153.7. ‘Identified signals’ refer to signals proven to be known adverse drug reactions

The associated costs were expressed as the number of signals that would need to undergo review and further investigation for one true safety issue to be identified. The costs associated with detecting signals, expressed by NNDs, were highly variable across events. With the exception of bullous eruption, it seemed to be more ‘costly’ to detect safety signals in EU-ADR than in EudraVigilance, with a median NND across all events of 7 versus 5. The most ‘costly’ event in EudraVigilance was bullous eruption (NND = 8), and the least ‘costly’ were UGIB and acute pancreatitis (NND = 2). In EU-ADR, the most costly signals to detect were those related to hip fracture (NND = 9) and AMI (NND = 7), while the least costly were those related to pancreatitis and bullous eruption (NND = 3) (see Fig. 5).

Fig. 5
figure 5

Numbers needed to detect a true association. Note: this represents the ‘cost’ per signal, e.g. in the case of detection of acute myocardial infarction (AMI) in EudraVigilance, 5 signals needed to be investigated in order to detect 1 true association. BE bullous eruption, EU-ADR exploring and understanding adverse drug reactions, FRACT fracture, PANC pancreatitis, UGIB upper gastrointestinal bleeding

4 Discussion

In this study, we aimed to investigate an additional value of the EHR-based system for signal detection on top of the traditional SRS. We focused on five different adverse events deemed to be important in pharmacovigilance: bullous eruption, AMI, acute pancreatitis, hip fracture and UGIB.

Although EudraVigilance identified more signals overall than EU-ADR—41 % of signals (187/458) compared with 32 % (147/458)—as previously shown in other studies [16], this was not unexpected, considering that EudraVigilance has worldwide coverage, whereas EU-ADR covers only three countries. Furthermore, the EudraVigilance system is primarily designed for signal detection.

Patadia et al. [16] performed a time-restricted analysis (before and after a safety communication/media attention) and showed that this has an impact on the numbers of signals detected in both data sources in opposite ways (i.e. an increase in the number of signals in the SRS after media attention and a decrease in the EHR). While this effect might partially explain our findings of higher sensitivity for EudraVigilance, we consider it unlikely that it entirely explains the difference, since the majority of signals that were tested did not attract media attention.

The capacity of EU-ADR and EudraVigilance to detect signals was shown to differ depending on the nature of the adverse event being investigated. The relatively poor performance of EudraVigilance in detecting hip fractures and AMI might be due to the fact that both events are not usually perceived as being drug induced and thus often fail to be recognized and reported as ADRs, as has previously been hypothesized [34]. The suspected ADRs documented in an SRS such as EudraVigilance are highly dependent on the reporter’s ability to recognize them as such, and some characteristics are helpful in this respect: biological concordance with the drug mechanism of action, a short time to onset, a positive dechallenge and lack of alternative causes. The adverse events that are not so obviously attributed to drugs (because they are multifactorial), or that already have a high background incidence, are likely to be poorly captured by spontaneous reports [1, 3]. EHR-based systems do not rely on reporter judgment; therefore, these events may be better represented in such systems.

On the other hand, there is very low sensitivity for detecting rare drug-induced events in EU-ADR, and this is in line with previous research, which showed that, despite the broad coverage of the EU-ADR network (around 20 million patients), there is simply not enough (statistical) power to identify very rare events in the database [24]. Thus, for rare events that have drug treatment as their primary aetiology, SRSs still seems to be the better solution so far. Our findings are consistent with those of Patadia et al. [34], although different parameters were used to determine the complementarity of the two systems.

We found a correlation between the background incidence of the events and the contribution of each database to signal detection; the correlation was statistically significant for EudraVigilance but not for EU-ADR, possibly because of the small number of events tested.

The burden associated with screening any data source for signals depends on the number of signals that require further assessment or investigation and the workload involved in each of these investigations. The amount of work needed to confirm or refute a signal is highly variable, ranging from simple product information checks to more complex analyses and formal pharmacoepidemiological studies. In their study, Pizzoglio et al. [35] reported a median time of 6 hours for initial assessment of a signal (range 2–26 h). We did not collect similar information on the time spent on assessment of the signals in our study, since a semi-automatic method was used; however, we considered the number of signals that needed further investigation to be a reasonable proxy to express the associated costs. We found that for all of the events (with the exception of bullous eruption), it was more costly to identify signals within the EHR-based system. However there was a notable difference across the events as follows: for hip fracture, where EU-ADR provided the most added value, an 80 % increase in the cost per signal was observed in comparison with detection in EudraVigilance. On the other hand, for acute pancreatitis, the cost associated with signal detection in EU-ADR was not much higher than that in EudraVigilance; therefore, in this case, it may be justifiable to use both systems. Because the two systems provided roughly the same contribution to detection of UGIB-related signals, the value of supplementing SRSs with EHR-based systems is probably dependent on the types of drugs that can be captured in the particular EHR database. Detecting AMI proved to be costly in both systems; however, the seriousness of the event, its public health impact and the difficulty of detection might justify the extra cost of using both an SRS and an EHR-based system. For bullous eruption, because of the lack of an additional gain (only one extra signal was identified), it might not be efficient to use the EHR as a secondary signal source.

The range of events tested in this study, albeit carefully selected, represented only a small sample of all possible adverse events and therefore limits the external validity of this research. The applicability of our findings to a broader range of events will require further investigation. In addition, the overall background incidence of the events that were selected was slightly skewed towards more common events and, if our hypothesis holds true, this might have resulted in a bias favouring EHR performance.

Another limitation was the retrospective nature of the study: we actually tested the capacity of systems not to identify signals (i.e. new associations) but rather to detect already identified safety issues. As Norén et al. [36] pointed out, ideally the evaluation should be done using emerging safety issues and not well-established ADRs. However, this is a common limitation in signal detection research because of the difficulties that accompany building a ‘reference standard’ and the long time needed to gather data prospectively.

Moreover, combining the two systems involved many decisions regarding the choice of the signal detection method and their implementation (e.g. the choice of thresholds and precision estimates), which had a huge impact on the results [37]. In addition, it was repeatedly demonstrated that the aggregation level at which data mining is performed influences the results [38, 39], and we consider that the appropriate level is still an open question. Nevertheless, the decision to use SMQs instead of preferred terms (PTs) may have had an impact on the results. We acknowledge that use of different signal detection methodologies and reference standards, as well as definitions of events used in data mining, might lead to different results. Therefore, our results might, to some extent, reflect the relative strengths of each data source, but they would also be influenced by the algorithms that were chosen. Although we acknowledge that the time to signal detection is an important element to take into account, for this study we focused primarily on other quantitative measures, such as the number of signals and the number of false positives, which are equally important.

This evaluation did not take into account the fact that an EHR-based surveillance system would require additional work for implementation and subsequent maintenance for the purposes of signal detection (versus SRSs, which are already established for such activities) and would thus incur extra costs, which would be difficult to estimate.

Our study is one of the few studies performed so far that have tried to explore how an SRS and an EHR-based system might be used together with the aim of augmenting drug safety surveillance. A previous study by Harpaz et al. [12] had a similar aim but a different strategy, trying to combine information from both data sources at a very early stage in order to improve the ranking of signals by replication of findings. In contrast, we explored scenarios where use of EHRs can fill the gaps and add value to already existing systems.

5 Conclusion

The more prudent goal in signal detection is identification not of all signals but of the majority of signals in the most efficient way—with the least time- and resource-consuming approach. With this aim in mind, we showed that an EHR-based system may complement an SRS in certain situations, especially in the presence of adverse events with a high background incidence. While the SRS appeared to be more cost effective overall, for some events that are very hard to pick up, the costs associated with additional signal detection in an EHR-based system may be justifiable.