Computational Approaches for Pharmacovigilance Signal Detection: Toward Integrated and Semantically-Enriched Frameworks
Computational signal detection constitutes a key element of postmarketing drug monitoring and surveillance. Diverse data sources are considered within the ‘search space’ of pharmacovigilance scientists, and respective data analysis methods are employed, all with their qualities and shortcomings, towards more timely and accurate signal detection. Recent systematic comparative studies highlighted not only event-based and data-source-based differential performance across methods but also their complementarity. These findings reinforce the arguments for exploiting all possible information sources for drug safety and the parallel use of multiple signal detection methods. Combinatorial signal detection has been pursued in few studies up to now, employing a rather limited number of methods and data sources but illustrating well-promising outcomes. However, the large-scale realization of this approach requires systematic frameworks to address the challenges of the concurrent analysis setting. In this paper, we argue that semantic technologies provide the means to address some of these challenges, and we particularly highlight their contribution in (a) annotating data sources and analysis methods with quality attributes to facilitate their selection given the analysis scope; (b) consistently defining study parameters such as health outcomes and drugs of interest, and providing guidance for study setup; (c) expressing analysis outcomes in a common format enabling data sharing and systematic comparisons; and (d) assessing/supporting the novelty of the aggregated outcomes through access to reference knowledge sources related to drug safety. A semantically-enriched framework can facilitate seamless access and use of different data sources and computational methods in an integrated fashion, bringing a new perspective for large-scale, knowledge-intensive signal detection.
A number of comparative studies assessing various signal detection methods applied to diverse types of data have highlighted the need for combinatorial-integrated approaches.
Large-scale integrated signal detection requires systematic frameworks in order to address the challenges posed within the underlying concurrent analysis setting.
Semantic technologies and tools may provide the means to address the challenges posed in integrated signal detection, and establish the basis for knowledge-intensive signal detection.
One of the most important aspects of marketed-drug safety monitoring is the identification and analysis of new, medically important findings (so-called ‘signals’) that might influence the use of a medicine . According to the CIOMS VIII Working Group, a signal constitutes “information that arises from one or multiple sources (including observations and experiments), which suggests a new potentially causal association, or a new aspect of a known association, between an intervention and an event or set of related events, either adverse or beneficial, that is judged to be of sufficient likelihood to justify verificatory action” .
Computational analysis methods constitute an important tool for signal detection [3, 4]. Lately, the field of signal detection has been very active, with various large-scale collaborative initiatives and projects, such as EU-ADR (http://euadr-project.org/), Mini-Sentinel (http://www.mini-sentinel.org/), OMOP (http://omop.org/), and PROTECT (http://www.imi-protect.eu/). While various advances have been illustrated, e.g. common data models , reference datasets for evaluation , as well as new analysis methods and systematic empirical assessments [7, 8, 9, 10, 11, 12], the challenge of accurate, timely and evidence-based signal detection still remains .
In this paper, we first present a brief overview of postmarketing data sources and computational analysis methods, and highlight their strengths and limitations for signal detection, taking into account recent comparative studies. Under this perspective, we indicate the need for combinatorial signal detection, relying on the concurrent exploitation of diverse data sources and detection methods, and refer to early successful paradigms. We argue that in order to explore combinatorial signal detection in its full potential, semantically-enriched detection frameworks are required to overcome existing barriers. We also illustrate how such a framework can be incorporated within the signal detection workflow, refer to example applications of semantic technologies in drug safety and, finally, discuss this perspective in the scope of large-scale, knowledge-intensive signal detection.
2 Data Sources and Signal Detection Methods: The Need for Combinatorial Exploitation
Spontaneous reporting systems (SRSs) These constitute the dominant signal source through which cases of suspected adverse drug reactions (ADRs) are reported by healthcare professionals or citizens to regulatory authorities or other bodies. Typically, methods for the analysis of SRS data rely on the statistical investigation of disproportionality (DP) , or are based on multivariate modeling [3, 4]. A comprehensive review of SRS-based signal detection methods has been presented by Hauben and Bate . Despite SRSs having been quite extensively analyzed, advances on detection methods are still being demonstrated, such as the vigiRank algorithm , which combines multiple strength-of-evidence prediction indicators to improve accuracy compared with DP analysis alone.
Structured longitudinal observational healthcare databases These are primarily obtained from Electronic Health Record (EHR) and administrative claim systems, and offer the potential to enable active and real-time surveillance . Signal detection methods applied to this type of data typically involve data-mining techniques that have their origin from statistical epidemiology [7, 17], e.g. case–control methods , cohort methods , self-controlled case-series methods , and self-controlled cohort design methods [7, 9]. Notably, DP-based methods, originally proposed for the analysis of SRS data, have also been applied to observational data , following appropriate extensions and data transformations . A comprehensive review of signal detection methods exploiting observational data has been presented by Suling and Pigeot .
Unstructured/free-text sources Typical examples include clinical narratives, scientific literature and patient-generated content, e.g. in social media. Extraction of information associating drugs with adverse events from unstructured text requires the employment of text-mining techniques . Clinical narratives are a major part of many clinical information systems and, despite the complexity and barriers in processing clinical text , successful information extraction paradigms have been illustrated [22, 23]. The literature has also been explored to provide indications for signals, e.g. by using corpuses extracted from PubMed, and methods relying on the Medical Subject Headings (MeSH) indexing system and statistical inference [24, 25]. Patient-generated data, either shared among networked communities using social media (e.g. blogs, messaging/micro-blogging platforms and forums)  or implicitly captured through Web search logs , have been more recently explored for signal detection, with interesting findings. A review on text mining for adverse drug event detection considering various types of free-text data has been presented by Harpaz et al. .
Data sources for signal detection: advantages, shortcomings and respective challenges for the application/development of computational signal detection methods based on empirical knowledge and the literature
Highly relevant (specific focus on drug safety incident documentation)
Controlled (data captured via predefined/standard forms)
Coverage of diverse populations in international SRSs
Public availability (in some cases, e.g. FAERS)
Use of independent information sources for hypothesis generation and assessment 
Identify complex safety patterns, exceeding single drug–adverse event pairs 
The ‘lack of denominatorʼ (i.e. only the number of people who are exposed to drugs and have the event known, not the number of people who are exposed to the drugs) 
Observational healthcare databasesa
Longitudinal healthcare information maintained by professionals (healthcare or administrative stuff)
No interviewer bias
Enable active and real-time surveillance
Not designed for drug safety incident identification
Complexities and potential inabilities to extract data (including access issues)
Bias introduced by local terminologies/vocabularies 
Need for sufficient data on drug exposure 
Multiple options available for defining health outcomes/events, and exposure 
Replication of results 
Quality-controlled, detailed information (inpatient EHR data are supposed to provide accurate diagnosis, laboratory results, drug dosage and administration time)
Difficulty in acquiring an adequate sample size to cover diverse populations for drugs and events (however, population size can be increased by applying methods to combine data sources [5, 34], at the cost of potential mismatched information and loss of accuracy)
Database may be significantly large, offering great variety in the population
Questionable information granularity and accuracy since they are maintained for billing purposes
Free-text data sourcesb
Vast information content
Not designed for drug safety incident identification
Requirement for sophisticated linguistic processing to account for colloquial language, grammatical/spelling errors, etc. 
Produced by healthcare professionals
Contain rich documentation of clinical conditions, treatments, and patient history
Complexities and potential inabilities to extract data (including access issues)
Bias introduced by local documentation procedures 
Account for temporal association among reports for the same patient 
Efficient big data management and analytics 
Quality-controlled through peer-reviewing and (sometimes) indexing 
May rely on assumptions and contain subjective conclusions
Cope with the varying strength of the provided evidence
Utilize indexing annotations, apply pure text processing, or use both? 
Real-time nature 
Large-scale data production 
Questionable reliability, validity and quality of data 
Duplicates (reproduction of content from users)
Encapsulate mechanisms for quality control 
Construct real-time surveillance methods 
Efficient big data management and analytics
Summary of indicative comparative studies of signal detection methods: design and major findings
Data explored and methods applied
Studies comparing signal detection methods applied to SRS data
van Holle and Bauchau 
Data 147,015 reports dated from 1987 to 2000, in a corporate SRS
Methods MGPS vs. a TTO algorithm
PPV primarily, and NPV, TP, FP, TN and FN secondarily
Events listed in a company’s Global Product Information System
Extensive parameterization of methods, i.e.:
For the DP-based method, a total of 336 different combinations of four stratification factors (sex, age, etc.) and cut-off values were assessed
For the TTO algorithm, 18 different combinations of alpha levels and time windows were investigated
TTO algorithm superior than MGPS, whatever the choice of parameter values
Trade-off between Sp and Sn, and TTO dependent on data quality
Suggestion to use both methods to benefit from the greater ability of TTO to detect TP signals, while avoiding signals being missed (or delayed) when the respective data are of low quality
Harpaz et al. 
Data 4,784,337 public FAERS reports from 1968 to Q3 2011
Methods MGPS, PRR, ROR, LR, ELR
OMOP reference set 
Analysis of performance at fixed levels of Sn and Sp
Application of Youden’s weighted index to identify optimal signal thresholds
Adoption of the broadest definition of events provided by OMOP (http://omop.org/HOI)
Multivariate modeling methods superior than DP-based methods
DP-based methods simpler and faster to compute
Not all events are equally detectable
Studies comparing signal detection methods applied to observational healthcare data
Ryan et al. 
Data Ten observational databases elaborated in OMOP (over 130 million records)
Methods Eight methods from the OMOP library (http://omop.org/MethodsLibrary)
Threshold-based (i.e. Sn, Sp, PPV at RR thresholds) and threshold-free measures (e.g. AUC)
Nine drug-outcome pairs classified as ‘positive controls’ and 44 pairs classified as ‘negative controls’
Multiple parameter settings explored per method
Many FP associations obtained from all methods
No clear optimal algorithm (result dependent on the desired trade-off between Sn and Sp)
Schuemie et al. 
Data From 20 million subjects in seven databases across three European countries, dated from 1997 to 2000, in the scope of EU-ADR
Methods Four DP-based methods (BCPNN, GPS, PPR, ROR), three cohort methods (BHM, IRR, LGPS), two case-based methods (matched CC, SCCS), one method for eliminating protopathic bias employed in combination with previous methods (LEOPARD)
Reference set of positive and negative controls for 10 (of the 23) important events proposed in EU-ADR 
Assumption: For DP methods, the occurrence of the event of interest during a period of drug exposure constitutes a potential drug–event association
Common settings applied for all methods to define exposures and outcomes
LEOPARD had a positive effect on the overall performance of all methods but some of the known ADRs were incorrectly flagged as protopathic bias
LGPS and case–control adjusting for drug count slightly superior
DP-based methods had lower performance, although not statistically significant
Some ADRs were not detected by all methods
Reps et al. 
Data Subset of the THIN database (http://www.thin-uk.com/) of approximately 4 million patients with over 358 million prescriptions and over 233 million medical events
Methods HUNT, MUTARA, TPD and (a modified version of) ROR
Natural threshold based measures, Average precision at cut-off K, AUC
A set of known ADRs for specific drug families (NSAIDs, quinolones and calcium channel blocker drugs, with multiple drugs per category)
Assumption: all medical events that occur within 30 days of the drug prescription are considered as possible drug–event pairs (i.e. filtering chronic conditions)
Multiple drugs from the same family were explored
No generally superior algorithm for all the drugs considered in the study
None of the algorithms performed well at detecting rare events
Liu et al. 
Data 12 years of EMR data from Vanderbilt University Medical Center
Methods BCPNN, GPS, PRR, ROR, Yule’s Q test and the Chi test (CHI)
Precision, recall and F-score
Two independent reference datasets of drug–event pairs:
1. 470 Drug–event pairs (10 drugs and 47 laboratory abnormalities)
2. 378 Drug–event pairs (9 drugs and 42 laboratory abnormalities)
Principle: Assess the correlation of abnormal laboratory results and specific drug administrations by comparing the outcomes of a drug-exposed group and a matched unexposed group
A potential drug–laboratory test ADR involved an individual with a normal pre-drug laboratory test result who had an abnormal laboratory result after drug administration
Results varied according to the dataset:
For the first dataset, ROR had the best F-score of 68 %, with 77 % precision and 61 % recall
For the second dataset, CHI, ROR, PRR, and Yule’s Q test all had the same F-score (62 %)
Shortcomings in detection accuracy/efficiency A high rate of false-positive indications [38, 43] and difficulties in detecting rare ADRs , while some events were not detectable despite the variety of the employed methods .
Complementarity A time-to-onset (TTO)-based method with a DP-based method when applied to SRS data , and DP-based methods with multivariate-based signal detection strategies exploiting observational data , were found to be complementary.
Based on the above remarks, we can conclude that all the available data sources and the concurrent use of different signal detection methods need to be considered in the construction of a holistic signal detection framework. In the following section, we refer to two characteristic studies, which elaborated on combining information across diverse data sources and signal detection methods.
3 On Combinatorial Computational Signal Detection: Examples
3.1 Joint Signaling in a Spontaneous Reporting System and an Electronic Health Record System
Given the maturity of drug surveillance based on SRS data, the progress made in the use of observational healthcare data, and the expectation that the two sources may complement each other, Harpaz et al.  argued that it makes sense to consider computational approaches that may combine information from these two types of sources. The motivation for the study was the assumption that a combinatorial investigation would either lead to increased evidence or statistical power of findings, or would facilitate new discoveries that may not be possible with either source separately. In particular, the study elaborated on the joint analysis of 4 million reports obtained from the FDA Adverse Event Reporting System (FAERS) and information extracted from 1.2 million EHR narratives, using DP analysis, in order to generate a highly selective ranked set of candidate ADRs and, consequently, advance the accuracy of signal detection. Ranking of outcomes relied on the ‘Precision at K’ metric . The focus was on three serious adverse reactions, while a reference set of over 600 established and plausible ADRs was used to evaluate the proposed approach against the single, FAERS-based signal detector.
The combined signaling system demonstrated a statistically significant large improvement over the FAERS in the precision of top-ranked signals (i.e. from 31 % to almost threefold for different evaluation categories). Probably even more important, this combinatorial analysis enabled the detection of a new association between a drug agent and an event that was supported by clinical review. Thus, the study concluded with promising initial evidence that exploring FAERS and EHR data in the scope of replicated signaling can improve the accuracy of signal detection in specific cases.
3.2 Signal Detection by Integrating Chemical, Biological and Phenotypic Properties of Drugs
The study of Liu et al.  followed an integrative perspective for ADR prediction by employing data beyond the phenotype level. Specifically, ADRs were predicted by jointly exploiting chemical (e.g. compound fingerprints or substructures), biological (including protein targets and pathways), and phenotypic properties of drugs (including indications and other known ADRs). Interestingly, the study suggested an approach for ADR prediction by combining the above types of information at the different stages of drug surveillance, i.e. chemical and biological for preclinical drug screening, and chemical, biological and phenotypic for postmarket surveillance.
This integrative analysis was focused on the prediction of 1385 known ADRs of 832 approved drugs, through five different analysis methods, namely logistic regression, naïve Bayes, K-nearest neighbor, random forest and a support vector machine. The elaborated data were obtained from public databases, while the evaluation was based on accuracy, precision, and recall, which were obtained from the best operating points of the global receiver operating characteristic (ROC) curve (resulting from merging the prediction scores for all ADRs). The study indicated that from the three types of information, phenotypic data were the most informative for ADR prediction. However, when biological and phenotypic features were added to the baseline chemical information, the proposed prediction model achieved significant improvements and successfully predicted ADRs associated with the withdrawal of specific drugs.
4 Towards Semantically-Enriched, Integrated Signal Detection
The studies presented in Sect. 3 differ, not only regarding the employed data but also in the computational methods used. Nevertheless, both studies explored ways to reinforce signal detection outcomes either via replicated signaling or by integrating phenotype data with biological and chemical drug information, respectively, with significant findings. Given the heterogeneity of data sources and the variety of computational methods employed for signal detection, in order to further elaborate and systematically establish such combinatorial approaches we find the adoption of methods and tools originating from the field of semantic technologies and knowledge engineering important.
4.2 Challenges for Large-Scale, Combinatorial Signal Detection
Employing an appropriate description schema to express quality attributes of data sources, useful for signal detection, as regards their structure, content and provenance. For SRSs, example attributes may be the coverage of predominant drugs and adverse effects, as well as the level of seriousness of the reports. Attributes for observational data sources may be the population size and the observation period. Similarly, a concise description of signal detection methods with respect to their input, output and analysis parameters is required, enabling their combined use either in parallel or in a conditional pipeline based on the output of individual steps. These descriptions shall be used by stakeholders contributing data and methods to the framework, in order to facilitate their selection and combined use by stakeholders interested in conducting signal detection.
The concrete definition of study parameters, concerning, for example, the drugs and health outcomes of interest (HOIs), enabling consistent comparisons of analysis experiments and their findings. Studies exploring observational data illustrated that this issue is of paramount importance and has a major impact on the outcomes of the analysis [35, 50].
Hiding the technical complexity in using signal detection method implementations and guide users to select and fine-tune their parameters. A significant variation among researchers in study design choices for signal detection has been highlighted by Stang et al. , due to various factors. Thus, it is important to provide guidance on study setup, given the focus of a signal analysis experiment. The PROMPT tool of Mini-Sentinel constitutes relevant work along this line (http://mini-sentinel.org/methods/methods_development/details.aspx?ID=1044).
Supporting evaluation of analysis outcomes in comparison with reference sources for both novelty detection (i.e. to exclude known ADRs) and acquisition of supportive evidence (e.g. biological attributes of drugs).
The common description of analysis results originated from experiments involving multiple data sources and signal detection method implementations along with the parameterization applied, including provenance information (i.e. the origin of the results with respect to the data, methods and parameter values that have been employed). Such a description will facilitate results sharing, systematic comparisons and assessing whether replication of experiments leads to similar results.
4.3 Semantically-Enriched Integrated Signal Detection
Ontologies for annotating with quality attributes datasets and methods involved in signal detection (to address issue I.1). To some extent, such an annotation has been elaborated in the scope of defining common data models to facilitate the analysis of observational data from various sources , with a major focus on the syntactic level, rather than on semantics. Moreover, a common description of methods is missing.
Semantic mappings between terminologies and/or ontologies obtained from ontology repositories (e.g. BioPortal, http://bioportal.bioontology.org/) that can facilitate the consistent definition of health outcomes and/or drugs of interest for a given signal detection scenario exploring diverse data sources through terminological/ontological reasoning (to address issue I.2).
Semantic rules linking the above quality attributes in order to support users in selecting the appropriate data, analysis methods and parameter settings for their drug surveillance use case, e.g. the targeted drug(s) or health conditions of interest (to address issue I.3).
Multifaceted querying of diverse drug safety resources for novelty assessment and for obtaining supporting evidence to interpret the findings of signal detection (to address issue I.4). This feature is possible thanks to the public availability of exploitable repositories of linked data in the domain of life sciences, such as Bio2RDF  and the EBI RDF platform , which provide programmatic access to resources such as chEMBL (https://www.ebi.ac.uk/chembl/), ClinicalTrials.gov (http://clinicaltrials.gov/), DrugBank (http://www.drugbank.ca/) and SIDER (http://sideeffects.embl.de/), to name a few.
Ontology-based annotation of outcomes with provenance information, enabling their sharing and facilitating comparisons (to address issue I.6).
We implement some of the above elements in the scope of the SAFER project (http://safer-project.eu/), which develops a semantically-enriched platform for combinatorial signal detection by exploring diverse open-source signal detection method implementations and publicly available data. In particular, we elaborate on the semantic harmonization of computational signal detection methods through an ontology model , aspiring to build a semantic registry of such methods and an integrated platform for experimentation on pharmacovigilance signal detection. Based on this ontology, we design and implement software interfaces to mediate between existing signal detection method implementations and to aggregate their outcomes, facilitating their exploitation under a common integrated framework. Besides using the built-in criteria of the employed computational methods for signal generation and ranking, we investigate other ways to address prioritization and management of the obtained outcomes (e.g. factors used in triage models [56, 57]), a challenging issue stemming from the parallel use of multiple signal detection methods that has also been remarked on by van Holle and Bauchau  and Harpaz et al. .
Notably, the Observational Health Data Sciences and Informatics (OHDSI, http://ohdsi.org/) collaborative elaborates on developing a global knowledge base (KB) bringing together and standardizing information for drugs and HOIs from various electronic sources relevant to drug safety . This KB will be extremely useful for establishing a common reference for assessing the outcomes of computational signal detection methods, and is closely related to the element E.4 described above.
Taking into account the availability of diverse signal detection method implementations as open source (e.g. http://omop.org/MethodsLibrary, http://cran.r-project.org/web/packages/PhViD/, http://mini-sentinel.org/methods/methods_development/), public repositories of linked data related to drug safety, as well as initiatives for openly exposing drug safety data through programming interfaces such as openFDA (https://open.fda.gov/), we believe that, although challenging, the establishment of semantically-enriched platforms for integrated signal detection is feasible.
Although not explicitly related to the integrated signal detection perspective presented in this paper, semantic technologies have been employed in several drug safety applications, e.g. in order to automate signal detection, address heterogeneous data integration for safety assessment, normalize drug safety data, facilitate adverse event reporting, and semantically analyze case reports. Aiming to illustrate the applicability and added value of these technologies in the domain of drug safety, the following section refers to some interesting examples.
5 Applications of Semantic Technologies in Drug Safety
Bousquet et al.  developed PharmaMiner, a tool aiming to reinforce automated signal detection when using the Medical Dictionary for Regulatory Activities (MedDRA®) to express adverse events. The study relied on the argument that automatic grouping of MedDRA® terms expressing similar medical conditions would increase the power of detection algorithms. Through an ontology, PharmaMiner employed terminological reasoning in order to group semantically-linked adverse events, and applied standard statistical analysis methods for the detection of potential signals. Evaluation of PharmaMiner with a dataset of 42,284 case reports extracted from the French Pharmacovigilance database illustrated that the approach enabled the identification of more occurrences of drug–ADR associations than using the original MedDRA® hierarchy, and could thus help pharmacovigilance experts to increase the number of responses to their queries when investigating case reports coded with MedDRA®. Recently, Bousquet et al. extended the approach for automatically grouping MedDRA® terms through the OntoADR ontology .
Stephens et al.  presented a use case of semantic web technologies for drug safety by focusing on heterogeneous data integration and analysis. The motivation was that semantic technologies can simplify data integration across multiple sources and support the logic to infer additional insights from the data. Given that effective decision making regarding a drug’s safety profile requires the assessment of all the available information regarding, for example, the compound, the target and the patient group, Stephens et al. employed ontology-based inferencing and rules in order to guide decisions on either continuing to pursue a compound or withdrawing a drug from the market through the analysis of diverse data. The approach was illustrated via an example rule assessing the drug toxicity risk, which employed diverse parameters used along the drug discovery and development pipeline, such as the structural similarity of the compound of interest with others that failed due to toxicity, the compound binding to the target expressed with a number of single nucleotide polymorphisms (associated with the range of response to the drug), the therapeutic index, clinical findings, therapeutic dose, etc.
Wang et al.  elaborated on the normalization of FAERS data through semantic technologies. The objectives of their work were, first, improving the mining capacity of FAERS data for signal detection and, second, promoting semantic interoperability between the FAERS and other data sources. As drugs are registered in the FAERS by arbitrary names, e.g. trade names and abbreviations, and may even contain typographical errors, the lack of drug normalization introduces substantial barriers for data integration in signal detection. Wang et al. normalized drug information contained in the FAERS using RxNorm (http://www.nlm.nih.gov/research/umls/rxnorm/), a standard terminology for medication, while drug class information was obtained from the US National Drug File Reference Terminology (NDF-RT, http://www.nlm.nih.gov/research/umls/sourcereleasedocs/current/NDFRT). Regarding adverse event names, although the FAERS provides normalized terms based on MedDRA® preferred terms (PT), Wang et al. demonstrated that this normalization reinforces the data aggregation capability when linking PT terms to their corresponding system organ class (SOC) categories. The study resulted in a publicly available knowledge resource (http://informatics.mayo.edu/adepedia/index.php/Download), which can be extended by connecting data from clinical notes, scientific literature, gene expression, etc., and various other ontologies.
Aiming to address interoperability issues that hamper the conduction of postmarketing safety analysis studies on top of EHR systems, the SALUS project (http://www.salusproject.eu/) built a semantic framework and a dedicated toolkit to serve this purpose. An interesting part of this toolkit was a component aimed at facilitating spontaneous reporting by prepopulating the respective forms with relevant EHR data . To this end, standard (e.g. based on HL7) and proprietary EHR data models were mapped to the E2B data model (Electronic Standards for the Transfer of Regulatory Information, http://estri.ich.org/), while terminology mapping and reasoning services were designed to ensure the automatic conversion of local EHR terminologies, e.g. International Classification of Diseases, Tenth Revision (ICD-10, http://www.who.int/classifications/icd/en/) or Logical Observation Identifiers Names and Codes (LOINC®, http://loinc.org/), to MedDRA®, which is dominant for adverse event reporting. The partial automation of the adverse event reporting process achieved through this tool is expected to contribute to the reduction of underreporting, which has been often argued as a limitation of SRSs.
Sarntivijai et al.  employed an ontology to conduct a comparative analysis of adverse events associated with two different types of seasonal influenza vaccines, namely killed and live influenza vaccines. The study analyzed reports from the US Vaccine Adverse Event Reporting System (VAERS), referring to a number of vaccines of the above categories based on measures such as the proportional reporting ratio and Chi-square. The identified adverse events were grouped using the Ontology of Adverse Events (OAE, http://www.oae-ontology.org/), based on their semantic similarity. This approach provided better classification results compared with MedDRA® and Systematized Nomenclature of Medicine–Clinical Terms (SNOMED–CT®)-based classifications. The analysis indicated that live influenza vaccines had a lower chance of inducing two severe adverse events compared with the other vaccine category, while previously reported positive correlation between one of the serious events and influenza vaccine immunization was based on trivalent influenza vaccines rather than monovalent influenza vaccines.
6 Discussion and Conclusions
Lately, signal detection has been a very active research field, demonstrating not only important advances but also many new challenges [4, 28, 65, 66]. Further data sources are considered for signal detection and, consequently, new computational methods and approaches are constantly being proposed for their analysis [16, 26, 27]. Important findings as regards the capacity of new/established methods and data sources for signal detection have been illustrated by a number of comparative studies [31, 33, 38, 41, 43, 44]. Although it is arguable to what extent the outcomes of these studies are generalizable, it is clear that the concurrent use of multiple methods and data sources is essential.
Interestingly, increased drug surveillance through the synthesis of all possible information sources has been suggested by regulatory bodies  and highlighted in the literature . Given the need to obtain more reliable and timely insights on drug safety risks, it has been reasonably argued that combining information across data sources could lead to more effective and accurate signal detection. These combinatorial investigations are expected to increase evidence on the obtained results, or provide new insights that may be not possible by investigating a single source. Some early successful examples of this approach for signal detection have been recently presented in the literature [45, 47].
However, in order to explore this perspective in its full potential we need systematic frameworks that will facilitate pharmacovigilance stakeholders to seamlessly share, access, and effectively use different data sources and computational methods for signal detection. In this article, we highlighted the challenges towards such a development and argued that semantic technologies bring the technical endeavor for this advancement. We also introduced concrete elements towards an integrated, semantically-enriched signal detection framework, spanning from the description of data sources and computational methods for selection, support in study setup, advanced access to reference linked-data resources for evaluation, and uniform description of the obtained outcomes.
The applicability and virtue of semantic technologies in drug safety have been illustrated in several applications. Aligned with the integrated signal detection perspective, we are currently developing a semantically-enriched signal detection platform relying on the semantic harmonization of signal detection methods and data sources . Our research complements other efforts in the field, such as the OHDSI KB  and the SALUS semantic interoperability platform and tools , bringing a new perspective on large-scale, knowledge-intensive signal detection, and aspiring to increase efficiency, automation, support and collaboration for pharmacovigilance stakeholders.
This research was supported by a Marie Curie Intra European Fellowship within the 7th European Community Framework Programme FP7/2007–2013 under Research Executive Agency (REA) Grant agreement No. 330422—the SAFER project. Vassilis Koutkias and Marie-Christine Jaulent have no conflicts of interest to declare.
- 1.World Health Organization. A practical handbook on the pharmacovigilance of antimalarial medicines. Geneva: WHO Document Production Services; 2008.Google Scholar
- 2.Council for International Organizations of Medical Sciences. Practical aspects of signal detection in pharmacovigilance: report of CIOMS Working Group VIII. Geneva: Council for International Organizations of Medical Sciences; 2010.Google Scholar
- 55.Koutkias V, Jaulent MC. Leveraging post-marketing drug safety research through semantic technologies: the PharmacoVigilance Signal Detectors Ontology. In: Proceedings of the International Workshop on Semantic Web Applications and Tools for Life Sciences (SWAT4LS): 9–11 Dec 2014, Berlin.Google Scholar
- 65.Platt R, Carnahan R, editors. The US Food and Drug Administration’s Mini-Sentinel Program. Pharmacoepidemiol Drug Saf. 2012;21:S1–303. http://onlinelibrary.wiley.com/doi/10.1002/pds.v21.S1/issuetoc
- 66.Evans SJW, editor. Studying the science of observational research: empirical findings from the Observational Medical Outcomes Partnership. Drug Saf. 2013;36:S1–204. http://link.springer.com/journal/40264/36/1/suppl/page/1
- 67.European Commission. eHealth for safety: impact of ICT on patient safety and risk management. 2007. Available at: http://www.ehealth-for-safety.org/news/documents/eHealth-safety-report-final.pdf. Accessed 12 Feb 2015.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.