“The world is full of signals that we don’t perceive” wrote Stephen Jay Gould in “The Panda’s Thumb” [1]. He was suggesting that physical limits of human perception (hearing, smell, touch, taste) lead us to miss signals all around us that other animals with more acute senses can use to their advantage. The fundamental challenge of signal detection [simultaneously heightening both perception (sensitivity) and discrimination (specificity)] is embodied in this quotation, and the paper by Li and colleagues [2] in this issue of Drug Safety aims to turn up the volume in an effort to improve signal detection while screening out excess noise that comes with it. The “sound” on which the authors seek to improve signal detection is not the mating call from a far-away member of the same species, but rather the occurrence of an adverse event resulting from medication use.

Patients take medications with the expectation that the benefits will outweigh the risks, and the public expects a positive benefit-risk balance for marketed drugs, provided they are used appropriately. The evidence for both benefits and risks that serves as a foundation for this assessment is constantly being updated and each new piece of information might alter the balance. Adverse event monitoring contributes to these assessments, and the adverse event reports received by the Food and Drug Administration Adverse Event Reporting System (FAERS) represent an ongoing engagement of practitioners, patients, and manufacturers with the US Food and Drug Administration (FDA). The FDA receives more than a million adverse event reports per year, a number that has increased substantially over recent years [3].

This expanding resource for drug safety information is only beneficial if put to constructive use, and the number of reports received by the FAERS rules out certain approaches; reviewing individual case reports and discerning patterns across them becomes impractical at numbers much smaller than the FAERS. Various statistical signal detection methods (such as disproportionality analyses and others) are available to mine this wealth of data and identify potential signals, permitting more targeted investigation, and efficient allocation of limited resources [4]. If a signaling approach generates too many alarms, alarm fatigue might follow, with the consequence of inaction when truly needed. Filtering the signals to reduce false positives is an approach to refining a signaling method, and of greater value with increasing numbers of adverse event reports. However, Evans [4] points out that tinkering with the signaling methods or simply increasing the size of an existing data source will likely only produce marginal gains and that substantial improvement in signal detection requires new types of data, such as might be obtained by combining different types of existing data sources.

Li and colleagues [2] extend the framework of adverse event signal detection across data sources by applying their methods in a combined way across different data sources, including a spontaneous reporting system, an inpatient electronic health record system, an outpatient electronic health record system, and a health insurance claims database. They compared the databases across four categories of adverse event (acute myocardial infarction, gastrointestinal bleed, acute renal failure, and acute liver injury), developing confounder-adjusted signal scores using LASSO (Least Absolute Shrinkage and Selection Operator) regression as measures of the strength of association between each medication and the target adverse event. They calibrate these scores using “reference negatives” and then combine the scores across the different data sources to see if the combination improves signaling relative to either data source alone. Their metric of performance is the area under the receiver operating characteristics curve, which reflects both sensitivity and specificity, a sensible metric for evaluating this method with the caveat that it requires a priori knowledge. Their results are in line with theory, in that the performance tends to be better in combined data, except with the combination of the FAERS with inpatient electronic health records, where a combination of small sample size and types of adverse events detectable (those that occur in hospital or lead to hospitalization) limit this particular test case. There may be some information lost in their approach of combining a primary suspect medication in an adverse event report with concomitant medications in that it downplays the clinical judgment of the adverse event reporter. While this information may be subject to a range of biases as the authors point out, future work may find ways to use this information and improve signaling. Another limitation that may be amenable to future improvement is the assumption of a single odds ratio for a drug-adverse event combination, in that it ignores the heterogeneity that is likely present within or across data sources.

Similar data sources, such as different health insurance plans, may be combined in pursuit of increased sample size (increasing the breadth of a data resource), or different data sources (such as medical and laboratory data) may be combined to enrich the data (increasing the depth of a data resource). The separation between different ‘types’ of data, spontaneous reports, registries, clinical trials, and insurance claims data, is becoming less clear as data across platforms may be feasibly combined in ways that were previously unimaginable. Large collections of electronic health records linked to claims data may be screened with natural language processing permitting drug safety reporting outside spontaneous reporting systems, a capability with the promise to merge the strengths of spontaneous reporting with those of insurer databases [5]. Further, novel data sources such as social media can also be mined for adverse events related to drugs [6].

While it is comforting to know that that existing data sources for adverse event monitoring continue to expand not just in terms of breadth, but also depth and timeliness, we anticipate new data types adding dimensions to benefit-risk assessments not previously known. This era of data bigness moves us closer to the ability to conduct robust adverse event monitoring in near real time. Of course, fundamental principles still apply, so that suitable expertise in medicine, pharmacology, and research methods can partner with computer science to guide the development of tools that permit screens to be conducted in pharmacologically plausible ways within etiologically relevant risk windows.

Human predilection for false positives can be imagined as an evolutionary adaptation to the recurrent challenge of detecting a predator. The energy wasted in activating the fight or flight response whenever a shadow is seen or a sound is heard might confer an evolutionary advantage, even if it only rarely enables us to evade predation because passing genes to the next generation is strongly contingent on this outcome. Many of Gould’s essays provide a warning against the facile speculation embodied in stories such as this but also include a warning against the too hasty dismissal of the facts on which they are founded. This literary device embodies the challenge of signal detection and the aim of the paper by Li and colleagues, an aim made all the more pressing by potential future data availability.