FormalPara Key Points

In a scoping review, we have investigated the types of evidence that have been used to support 2421 pharmacovigilance signals communicated internationally.

Signals based only on anecdotal reports have become increasingly common and dominate the literature (70%).

Most signals based only on anecdotal reports are supported by evidence involving temporality and dechallenge/rechallenge, but clear reporting of how judgments of association or causality are made is uncommon (13%).

1 Introduction

Interventional and observational studies, along with anecdotal reports, can offer evidence that an event or a group of events is causally linked to one or more medicinal products. Such indications are typically referred to as signals of adverse drug reactions (ADRs, [1]). When they are solely supported by ratios of observed over expected counts of reports of ADRs, they are called signals of disproportionate reporting (SDRs) [1]. A signal can be described as a hypothesis, supported by data and arguments, that there is a causal association between a medicinal product and an adverse event [2] that may justify further investigations aimed at quantifying the risk of an ADR. One of the objectives of pharmacovigilance is the detection of such signals, and it is carried out by stakeholders such as regulatory agencies, research organizations, academia, pharmaceutical companies, and others. Overviews of the evidence underpinning signals, limited to the European Union (EU) [3], have shown that detected signals were chiefly based on case reports and were more likely to lead to regulatory action when supported by additional study designs [4]. Combined qualitative/quantitative algorithms for signal detection have suggested that the recency of reports of ADRs, their geographical spread, and the availability of narratives [5] were factors predictive of signals that led to regulatory action, such as inclusion in Summaries of Products Characteristics. However, further understanding of features (such as positive dechallenge or temporality) that are considered during clinical assessments of reports may complement such findings. Additionally, strategies to detect signals have progressively improved, reducing the time to signal detection, by adopting machine-readable Summaries of Products Characteristics, or, to a lesser extent, using clinically related concepts to specify outcomes of interest, as summarized in the PROTECT guidelines [6]. Nevertheless, the median time it takes for a signal to be communicated (time to communication, TTC) remains to be quantified. We sought to review the global published evidence that has been used to support signals, and chose a scoping review design.

The aims of the present review were:

  1. (a)

    To provide regulatory decision makers with a synthesis of the evidence that is used to support signals of ADRs.

  2. (b)

    To describe the frequencies of the features of reports of ADRs explicitly referred to as supportive of signals in clinical assessments.

  3. (c)

    To determine and compare the levels of evidence of signals/SDRs across stakeholders in pharmacovigilance.

  4. (d)

    To obtain insights into the TTC and to discover whether it has changed over time.

2 Methods

This study follows the 2018 Preferred Reporting Items for Systematic Reviews and Meta-analyses extension for Scoping Reviews [7]. We have previously published [8] and registered the protocol of this work on the Open Science Framework (registration number: osf.io/a4xns). We report all protocol deviations and additional information in the Electronic Supplementary Material (ESM).

In brief, in the first week of September 2020, we searched PubMed (https://pubmed.ncbi.nlm.nih.gov/), EMBASE (OvidSP), PsycINFO (OvidSP), Science Citation Index (Web of Science Core Collection), and Google Scholar (https://scholar.google.co.uk/), and in April 2021, we searched OpenGrey (https://opengrey.eu/) and GreyNet International (https://www.greynet.org/), the English websites of 35 regulatory agencies/authorities, drug bulletins, and the SIGNAL Document of the Uppsala Monitoring Centre (from the start of each source until 31 August, 2020). We complemented searches with backward citation screening. We sent Freedom of Information requests to the US Food and Drug Administration (FDA) and the European Medicines Agency, and contacted other regulators, to obtain full lists of signals. In this way, we retrieved large numbers of signals from Malaysia, Japan, Canada, and New Zealand, including items that, while not explicitly described as signals, were deemed as such by the regulators. When we did not retrieve records that were described using expressions other than “signal”, we ensured that our retrieval strategy was consistent, by confirming with regulators whether their lists of communications included both signals and other forms of communications of risks. The complete search strategy is detailed in table 1 of the ESM.

We included associations detected in clinical assessments of reports of ADRs and observational/interventional studies whose original authors used the term “signal” in the abstract/full text, if the ADRs had not been documented previously. We considered an ADR as previously undocumented if it was:

  1. (1)

    Unlisted on Summaries of Product Characteristics or equivalent documents in the warnings, contraindications, or untoward effects sections, as reported by the original authors at the time of communication; or

  2. (2)

    Not previously communicated via direct healthcare professional communications or other country-specific channels, as reported by the original authors at the time of communication.

If information on (1) or (2) above was unavailable in peer-reviewed publications, we verified that an ADR had not been described in the cited references of each included record in interventional or observational studies, i.e., that its risk had not been previously quantified. When none of this information was available, we adopted an agnostic approach, to avoid omitting possibly relevant publications, and analyzed their findings separately, but did not calculate the TTC.

We also included papers that described SDRs. We required SDRs to present complete thresholds for detection, requesting them from original authors if missing. We retained the earliest version of repeated communications of the same signal by the same stakeholder (the first author of a communication) and combined new information appearing in repeated communications with earlier information.

We excluded records that lacked full texts or where the authors explicitly stated that the evidence did not support a signal or that no signals were detected, or that we could not translate. We omitted signals/SDRs concerning the following: medical devices without active ingredients, supplements, lack of efficacy, medication errors, or beneficial effects.

We coded features of reports of ADRs to a codebook developed iteratively (see the ESM). Related codes were coalesced, or grouped according to the Bradford Hill guidelines, where feasible.

We attributed levels of evidence, applying the Oxford Centre for Evidence-Based Medicine (OCEBM) tool [9], to studies supporting signals. The OCEBM levels of evidence tool ranks study designs from 1 to 5, based on whether they are likely to provide the best available evidence for decision making, 1 being the highest level and 5 the lowest. We followed the row “What are the rare harms?” and omitted level 5 (mechanism-based reasoning) [10]. Thus, systematic reviews of randomized trials or n-of-1 trials were OCEBM level 1; examples of level 2 study designs included individual randomized trials, of level 3 prospective cohort studies with appropriate follow-up, and of level 4 series of case reports or case-control studies. For studies that fell outside the OCEBM levels, we postulated subtypes. Where we could not, we assigned no level of evidence. If multiple studies supported a signal, the highest quality level of evidence applied. Please see ESM.

We used two units of analysis: (1) studies and (2) signals.

  1. (1)

    Multiple studies of any design or any level of evidence supporting a signal were counted as one, as was a single study supporting multiple signals. This unit was distinct from publications, which could report more than one study. Studies were used in the characterization of the level of evidence.

  2. (2)

    Signals, as drug-event or drug-drug event combinations, were used to count the features of ADR reports.

All data were analyzed descriptively in primary and secondary analyses. The primary analysis focused on previously undocumented signals/SDRs (see the ESM) and characterized the level of evidence and stakeholder over time (counts by studies), as well as the features of reports of ADRs (counts by signals).

For each drug-event or drug–drug event combination in the primary analysis, we selected the ‘first report’ of an ADR based on the earliest of two dates, either the first date on which a report of the ADR was entered into VigiBase (data lock point: 30/08/2020), the World Health Organization’s global database of reports of suspected ADRs (minimum value of the E2b field FirstDateDatabase), or the first date on which a report of ADR was received by a regional/national pharmacovigilance center (minimum value of the E2b field: ReceiveDate). By subtracting from the year of communication of a signal/SDR that of the first report in VigiBase, we calculated the TTC. This approach was similar to that used in a previously published set of systematic reviews on withdrawals of marketing authorizations [11, 12], but which relied on unpublished, instead of peer-reviewed reports of ADRs.

The secondary analysis encompassed: (a) SDRs with ambiguous thresholds; (b) signals/SDRs whose prior ADR documentation was unclear; and (c) “laboratory signals” or signals from hospital monitoring. In it, we provided counts by studies for the level of evidence and stakeholders.

Records were managed in EndNote (version 8.2). One author (DS) retrieved, screened, and charted the data, a second cross-validated the findings (IJO), and a third settled disagreements after discussion (JKA). A member of the Uppsala Monitoring Centre retrieved the dates of the reports of ADRs to calculate the TTC. All data were charted and analyzed in Microsoft Excel.

3 Results

3.1 Overview of Publications and Their Distribution Over Primary/Secondary Analyses

We identified 9525 non-duplicate citations, of which 1509 were considered eligible, based on title/abstract screening (see Fig. 1). Through hand searches, we identified 2260 eligible publications and excluded 522 after a full-text review. In all, we included 2132 publications for analysis, corresponding to 2591 studies, 9167 signals, and 4881 SDRs. Over 5000 signals came from a single study on drug–drug interactions [13]. Table 1 shows the distribution of studies and publications between the primary and secondary analyses. See the ESM for responses to our requests to regulators, for the full list of included and excluded records, and for the data charting form.

Fig. 1
figure 1

Preferred reporting items for systematic reviews and meta-analyses flow chart of the scoping review. 1Includes: 59 abstracts subsequently published as papers, 42 previously communicated signals, 3 duplicate publications. 2Includes: 41 non-systematic (narrative) reviews, 17 descriptive (quantitative) analyses, 9 measurements of usefulness of sources of information or biases affecting disproportionality analyses, 4 commentaries, 1 creation of an interventional program, 1 evidence mapping. 3Includes: 16 records of beneficial effects or drug repurposing, 18 of medication errors, false-positive laboratory abnormalities, or increases in plasma concentrations of a medicinal product after suspected drug–drug interactions. 4Includes: 9 not concerning medicinal products, 4 without data in humans (i.e., simulation studies), 1 withdrawn publication. 51728 from: 16 cited references in electronic records, 4 from Google Scholar, 2 from original authors, 585 from cited reference in the gray literature, 999 from websites, 122 from organizations. ADR adverse drug reaction

Table 1 Distribution of publications, studies, signals, and signals of disproportionate reporting (SDRs) between primary/secondary analyses, together with distribution of the same over publications/studies designed to clinically assess reports of adverse drug reactions (ADRs)

3.2 Primary Analysis

The primary analysis included 1974 publications, or 2421 studies, that communicated 9000 signals and 1861 SDRs; 2242 signals originated exclusively from clinical assessments of ADR reports (1683 studies or 941 publications). For 225 (13%) clinical assessments of ADR reports, presenting 228 signals, there were explicit judgments on the features of reports of ADRs (e.g., plausible time to onset) that supported signals. Only three signals in this subset were supported by multiple types of evidence, while the rest were exclusively supported by one.

3.2.1 Features of the Reports of ADRs Supporting Signals

Across the 228 signals, we recorded 12 distinct supporting features, which were positive dechallenge, temporality, positive rechallenge, exclusion of competing causes, single suspected drug, case ascertainment, consistency, biological gradient, specificity, coherence, and reporter type and reported causality assessment. Positive dechallenge/rechallenge were included in ‘experimental evidence’, and ‘single suspected drug’ in ‘exclusion of competing causes’ (see Table 2).

Table 2 Counts of features explicitly reported in the included clinical assessments of reports of adverse drug reactions

We further categorized the signals as supported by individual or multiple features: 89 were supported by one feature, 79 by two, 48 by three, and 12 by more than three. When supported by only one feature, the most frequent were temporality (n = 37/89), positive dechallenge (n = 23/89), and positive rechallenge (n = 13/89). Overall, temporality combined with positive dechallenge was the most frequent co-occurrence (72/228), followed by positive dechallenge with rechallenge (63/228); positive dechallenge and/or rechallenge co-occurred with temporality in 77/228 signals. See Fig. 2 for signals supported by two or three features (full data available on request).

Fig. 2
figure 2

Schematic representation of 79 and 48 signals supported by at least two or three features. Each node is labeled after the features invoked in clinical assessments of adverse drug reaction reports; linked nodes indicate features that co-occurred. In brackets, count of signals per feature. Excl. excluding

3.2.2 Levels of Evidence

There were 1974 publications in the primary analysis, corresponding to 2421 studies: regulators communicated 1563, private foundations 672, academia 168, healthcare workers 10, and pharmaceutical companies 8. OCEBM level 4 was the most frequent. In 71 cases, the study design was unclear.

Analysis by OCEBM subtypes showed that within the main categories, the largest contributors to level 4 were studies matching this level and mostly (1683/1778) comprised qualitative analyses of reports of ADRs alone, followed by disproportionality analyses (DAs) and analyses including studies with unclear designs. Disproportionality analyses mostly (n = 111/181, 61%) came from academia, fewer (n = 43/181, 24%) from regulators. OCEBM 1 and 2 mostly comprised randomized controlled trials (RCTs), either pooled or individual, that did not prespecify the outcomes of interest. Most of the OCEBM 3 studies matched this level (see Table 3).

Table 3 Distribution of 2421 studies over the Oxford Centre for Evidence-Based Medicine (OCEBM) Levels of Evidence, partitioned by subtypes and stakeholders

Over time, for any stakeholders, the number of studies per year remained at around 50 from 1986 (first recorded date) to 2010, with an initial increase beginning in 2011 and a larger increment from 2013 onwards (see Fig. 3). Considering clinical assessments of reports of ADRs alone, we recorded the yearly average of 36 studies in 2000–2012 and 116 in 2013–2019, corresponding to about a three-fold increase (3.2). This was a 3.3-fold increase for studies matching OCEBM 4 as a whole. For OCEBM 1–3, the average was 10 for the first period and 20 for the second, i.e., a near two-fold increase (1.8). Out of 181 DAs, 107 (59%) were communicated in 2013–19 (25 in 2013–2015 and 82 in 2016–2019; data available on request). There were 2.8 studies per year on average in 2000–2012, and 15 per year in 2013–2019 (ratio 5.4). During this second period, academia communicated on average 11 DAs per year.

Fig. 3
figure 3

Stacked bar graph, showing the numbers of unique studies grouped by year and by Oxford Centre for Evidence-Based Medicine (OCEBM) levels, irrespective of stakeholder, concerning 2350 studies (71 classified as N/A were omitted). In red OCEBM 1, yellow OCEBM 2, blue OCEBM 3, and dark green OCEBM 4 (excluding subtype 4, i.e., disproportionality analyses [DAs]); in light green, OCEBM subtype 4

3.2.3 TTC of Signals

We computed, where possible, the TTC for signals/SDRs in the primary analysis. 7992/10,861 had TTC ≥ 0 and 762 < 0. Across all OCEBM levels and subtypes, the TTC was 0–51 years. There were 6200 signals with a TTC between 0 and 15 years, 1396 with a TTC between 16 and 30 years, and 396 with a TTC ≥30 years, with a median of 9 years. The median value was 8 years in 2000–2012 and 9 years in 2013–2019. See the ESM for a list of all the years of first reports in VigiBase.

3.3 Secondary Analysis: Levels of Evidence in Signals/SDRs Whose Prior Documentation was Unclear

We analyzed the level of evidence of 170 studies separately (158 publications, 3187 signals/SDRs): for 121 (71%), we could not ascertain whether the ADRs of interest had been previously documented, 26 (16%) did not report a threshold, and we received no replies to requests for clarifications (among which 14 had unclear prior documentation of the ADR), 9 (5.3%) reported incomplete values of disproportionality, 8 (4.7%) detected signals from laboratory values (1 in hospital monitoring), in 3 (1.8%) we recorded a mismatch between the reported threshold and entries regarded as SDRs, and for 3 no values of disproportionality were reported.

The most frequent OCEBM level was 4 (n = 161), followed by studies for which we could not assign a level of evidence (n = 7) and that consisted of laboratory signals/hospital monitoring. Analysis of OCEBM sub-levels showed that the largest contributors were DAs (n = 140, 82%).

4 Discussion

4.1 Summary of Main Findings

Reporting of judgments in clinical assessments is infrequent (225/1683 studies). Our results show that the most frequent features of reports of ADRs discussed in 228 signals were ‘experimental evidence,’ ‘temporality,’ and ‘exclusion of competing causes’. Temporality and positive dechallenge often co-occurred, as did positive dechallenge and rechallenge. We found 2591 studies across websites, bulletins, and electronic databases; OCEBM level 4 was the most frequent overall, with clinical assessments of reports of ADRs making up most of this category. In 2013–2019, these increased three-fold, while other studies (OCEBM 1–3) doubled compared with previous years (2000–2012). Disproportionality analyses without clinical assessments were communicated about five times more often in later years and mostly by academia.

4.2 Features of Reports of ADRs

The features we identified are generally in keeping with previous research; they may reflect the ‘strength of evidence’ of reports of ADRs [14]. We have shown how ‘experimental evidence’ was among one of the most frequently considered features. The presence of positive dechallenge/rechallenge has been linked to a higher likelihood of amendments to section 4.8 of product information in the EU [4]. Similarly, a predictive model for the utility of case reports, developed by the FDA [15], suggested that positive dechallenge, positive rechallenge, and ‘designated medical events’ were among the strongest predictors of inclusion of reports of ADRs in assessments of case series. The ‘exclusion of competing causes’ may be understood by considering that reports containing more than one suspected medicinal product tended to be excluded from assessments [15].

In contrast, other features were infrequent, such as ‘biological gradient,’ ‘specificity,’ and ‘coherence’. This may be partly explained by the degree of completeness of reports of ADRs [16]. A review of the features of reports of ADRs of the FDA’s Potential Signals of Serious Risks suggested that the number of reports and prior documentation of an ADR in the year before regulatory action were predictive of regulatory actions [17]; neither appeared in our list of features. This study drew features of reports by analyzing the FDA Adverse Event Reporting System, while we extracted them directly from the clinical assessments, which may explain the discrepancy.

Few studies clearly presented judgments in support of signals. For context, these were available in 225/1683 clinical assessments of ADR reports (see Table 1). Current guidelines encourage consideration of features of reports in clinical assessments [18], thus, in the included studies, it is possible that while judgments were not clearly reported, features were accounted for. For instance, ‘reported causality’ (i.e., the availability of information on single-case causality assessments in structured fields) was one of the least frequent features. However, this feature was described as helpful in a decision-support system for signal validation when SDRs were confounded by indication [19]. To facilitate a critical interpretation of signals, judgments on features of ADR reports may be made clearer in clinical assessments (e.g., [20, 21]).

4.3 Levels of Evidence Underpinning Signals

Consistent with previous reviews [22,23,24], OCEBM level 4 was the most frequent. Between 2010 and 2018, RCTs formed the basis of 36% of FDA Drug Safety Communications [24], giving some support to the lower frequencies of OCEBM 1–2. The recent (2013–2019) increasing trend in OCEBM 4 studies, partly mirrored by OCEBM 1–3, continues to underscore the importance of reports of ADRs in pharmacovigilance and a growing relevance of interventional studies or systematic reviews of RCTs for signal detection. Well-reported RCTs often provide the first indications of the most frequent adverse effects of an intervention [25]; an increase in the use of RCTs as supporting evidence for signals may call for clearer reporting of detected signals in these study designs.

We found that we might have risked providing a biased overview of higher quality types of evidence had we assigned the same levels of evidence as prescribed by the OCEBM tool. For instance, the highest level of evidence, level 1, comprises four subtypes. From a critical perspective, a meta-analysis that includes trials that prespecify the outcome of interest provides a higher quality of evidence than one that does not. The use of subtypes allowed us to disentangle records that concerned disproportionality analyses alone from clinical assessments of reports of ADRs, as they would both be formally categorized as OCEBM level 4. In turn, we were able to show how DAs alone are rarely communicated by regulatory agencies and are primarily published by academia. While there were examples of published DAs [26,27,28] triggering regulatory procedures [29,30,31], further analyses may be needed to understand to what extent published DAs have a bearing on regulation in pharmacovigilance and whether a growing use of these methods is warranted to answer questions on drug-induced harms. We have shown that, despite a five-fold increase in DAs in recent years, regulators seldom rely on DAs alone and that some DAs lack information on prior documentation of ADRs. Earlier research has shown that DAs alone rarely support and may follow/confirm regulatory decisions [32], in addition to suffering from spin or overinterpretation [33], and that determining if SDRs merit further action further demands substantial resources [34].

4.4 TTC of Signals

The calculation of TTC could give rise to four scenarios: (1) TTC > 0, (2) TTC = 0, (3) TTC < 0, or (4) no TTC value. The first two would indicate that reports in VigiBase for a given signal/SDR were available at the time of communication, the third that while there were reports in VigiBase, they may have been entered in the database only after the year of communication. The last would indicate that there were no reports in VigiBase for a given medicinal product and event as of the data lock point.

A median TTC of 9 years is consistent with similar analyses of time to signal detection, showing a median time of 10 years [35]. There was a difference of 1 year in median TTC for the periods 2000–2012 and 2013–2019. However, prior documentation of ADRs may affect the TTC [36], as may the time on market of medicinal products [37] or the public health impact/intensity of ADRs in signal prioritization [4, 14]. The number of reports of suspected ADRs has grown with time, and VigiBase holds over 30 million reports as of 2022. Reports, however, still lack quality, with a large proportion missing narratives [38]. Lack of guidance on improving the quality of reports has previously been highlighted [39]. As such, our findings may suggest that there are delays before a minimum number of sufficiently complete reports are received by the competent regulators; reducing the TTC may require strategies to emphasize the usefulness of complete reports of ADRs in signal detection, targeting the reporters. A follow-up study to investigate whether completeness of reports of ADRs and other variables may affect the TTC is in progress.

4.5 Strengths and Limitations

The key strengths of this study lie in its exhaustiveness, absence of time restrictions, and use of inclusion criteria allowing retrieval of signals/SDRs, rather than established or identified risks (as defined in [40]). To the best of our knowledge, this is the first attempt at systematically collating and describing signals of ADRs globally.

Our primary aim was to characterise the evidence on which signals are based. To succinctly present the results of the study, we did not investigate the ensuing regulatory actions following signals. However, we plan to address this area in a future review.

Our results should be viewed in the light of the decision not to consider ‘biological plausibility’ or interpret disproportionality measures as evidence for ‘strength of association,’ whereas some signals may solely be supported by these considerations. It is also reasonable to assume that unreported judgments about features of reports of ADRs do not necessarily entail absence of judgments altogether.

We merged updates to communications of signals (see “Methods”), so the evidence-base underpinning signals may have accrued for communications updated over time. In other words, the follow-up time for communications may be unevenly distributed, particularly for later years (e.g., 2015–2020); thus, studies dated earlier (e.g., 2010–2014) may yield higher quality evidence. Other factors that might affect trends of OCEBM levels could be the completeness of the eligible studies. For example, signals could be included in full assessment reports or their summaries, depending on the regulator (see the ESM). Such differences may partly explain the presence of OCEBM subtype 4. Low frequencies of signals/SDRs primarily communicated by pharmaceutical companies may have arisen because we defined the first author as the communicator (see the ESM); an FDA review in fact suggested that most changes to product information in the USA were initiated by marketing authorization holders [22].

Misclassifications of signals, already noted in the early 2000s [41, 42], persist. Not only were signals reported across different webpages on websites of regulators, but they were also described using expressions defined otherwise as per the Council for International Organizations of Medical Science [43] (see the ESM). While misclassifications may have led to the omission of relevant records, as we required findings to be explicitly described as “signal(s)”, we mitigated their influence by reaching out to regulators and, in keeping with our inclusion criteria, asked which communications constituted signals, irrespective of the expressions used to qualify them (e.g., ‘safety issue’). Further harmonization may be needed to collect all signals in dedicated sections of drug regulatory websites.

A possible concern could also be that some methods papers that detected undocumented SDRs were included. To this end, we categorized papers, based on their aims, as to whether they developed or evaluated methods, but found only a small proportion (data not shown).

5 Conclusions

Judgments are clearly presented in only a small proportion of clinical assessments of ADR reports. When they are, experimental evidence, temporality, and exclusion of competing causes are key features supporting signals. Clinical assessments of reports of ADRs alone have increasingly supported signals in recent years. For the same period, the number of signals based on evidence of better quality has also grown but not at the same rate. Disproportionality analyses have been published particularly by academia over 2013–2019, while regulators rely more often on OCEBM levels 1–4 than DAs. Future research should evaluate the usefulness of DAs for decision makers.