Complementing Observational Signals with Literature-Derived Distributed Representations for Post-Marketing Drug Surveillance
As a result of the well documented limitations of data collected by spontaneous reporting systems (SRS), such as bias and under-reporting, a number of authors have evaluated the utility of other data sources for the purpose of pharmacovigilance, including the biomedical literature. Previous work has demonstrated the utility of literature-derived distributed representations (concept embeddings) with machine learning for the purpose of drug side-effect prediction. In terms of data sources, these methods are complementary, observing drug safety from two different perspectives (knowledge extracted from the literature and statistics from SRS data). However, the combined utility of these pharmacovigilance methods has yet to be evaluated.
This research investigates the utility of directly or indirectly combining an observational signal from SRS with literature-derived distributed representations into a single feature vector or in an ensemble approach for downstream machine learning (logistic regression).
Leveraging a recently developed representation scheme, concept embeddings were generated from relational connections extracted from the literature and composed to represent drug and associated adverse reactions, as defined by two reference standards of positive (likely causal) and negative (no causal evidence) pairs. Embeddings were presented with and without common measures of observational signal from SRS sources to logistic regressors, and performance was evaluated with the receiver operating characteristic (ROC) area under the curve (AUC) metric.
ROC AUC performance with these composite models improves up to ≈ 20% over SRS-based disproportionality metrics alone and exceeds the best prior results reported in the literature when models leverage both sources of information.
Results from this study support the hypothesis that knowledge extracted from the literature can enhance the performance of SRS-based methods (and vice versa). Across reference sets, using literature and SRS information together performed better than using either source alone, providing strong support for the complementary nature of these approaches to post-marketing drug surveillance.
Compliance with Ethical Standards
This work was supported by a US National Library of Medicine Grant (R01 LM011563).
Conflict of interest
Justin Mower, Trevor Cohen, and Devika Subramanian have no conflicts of interest relevant to the content of this study.
- 1.National Center for Health Statistics. Health, United States, 2016: With Chartbook on Long-term Trends in Health [Internet]. Hyattsville; 2017. Available from: https://www.cdc.gov/nchs/data/hus/hus16.pdf.
- 2.Center for Disease Control and Prevention. National Hospital Ambulatory Medical Care Survey: 2011 Outpatient Department Summary Tables [Internet]. 2012. Available from: https://www.cdc.gov/nchs/data/ahcd/nhamcs_outpatient/2011_opd_web_tables.pdf.
- 3.Hing E, Rui P, Palso K. National Ambulatory Medical Care Survey: 2013 State and National Summary Tables [Internet]. 2014. Available from: http://www.cdc.gov/nchs/ahcd/ahcd_products.htm.
- 4.Rui P, Kang K, Albert M. National Hospital Ambulatory Medical Care Survey: 2013 Emergency Department Summary Tables [Internet]. 2014. Available from: http://www.cdc.gov/nchs/data/ahcd/nhamcs_emergency/2013_ed_web_tables.pdf.
- 11.World Health Organization. The importance of pharmacovigilance. 2002; Available from: http://apps.who.int/iris/bitstream/10665/42493/1/a75646.pdf.
- 14.Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, et al. Bayesian methods in pharmacovigilance. Oxf Univ Press. 2011;23:29.Google Scholar
- 15.Center for Drug Evaluation and Research. FDA Adverse Events Reporting System (FAERS)—Reports Received and Reports Entered into FAERS by Year [Internet]. [cited 2017 Jul 16]. Available from: https://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm070434.htm.
- 22.Center for Drug Evaluation and Research. Questions and Answers on FDA’s Adverse Event Reporting System (FAERS) [Internet]. 2016 [cited 2017 Jul 19]. Available from: https://www.fda.gov/drugs/guidancecomplianceregulatoryinformation/surveillance/adversedrugeffects/.
- 24.Winnenburg R, Sorbello A, Ripple A, Harpaz R, Tonning J, Szarfman A, et al. Leveraging MEDLINE indexing for pharmacovigilance—inherent limitations and mitigation strategies. J Biomed Inform. 2015;100:425–35.Google Scholar
- 25.Collaborative TKB workgroup of the OHDS and I (OHDSI). Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data. J Biomed Semant. 2017;8:11.Google Scholar
- 26.Winnenburg R, Shah NH. Generalized enrichment analysis improves the detection of adverse drug events from the biomedical literature. BMC Bioinform. 2016;17:250.Google Scholar
- 28.Xu R, Wang Q. Large-scale combining signals from both biomedical literature and the FDA adverse event reporting system (FAERS) to improve post-marketing drug safety signal detection. BMC Bioinform. 2014;15:17.Google Scholar
- 30.Gordon MD, Dumais S. Using latent semantic indexing for literature based discovery. 1998. Available from: https://deepblue.lib.umich.edu/handle/2027.42/34255.
- 33.Smalheiser NR. Literature-based discovery: beyond the ABCs. J Am Soc Inf Sci Technol. 2012;63:218–24.Google Scholar
- 35.Swanson DR, Smalheiser NR. undiscovered public knowledge: a ten-year update. KDD [Internet]. 1996. p. 295–298. Available from: https://ocs.aaai.org/Papers/KDD/1996/KDD96-051.pdf.
- 36.Hristovski D, Burgun-Parenthoine A, Avillach P, Rindflesch TC. Towards using literature-based discovery to explain drug adverse effects. 24th Int Conf Eur Fed Med Inform Qual Life Qual Inf MIE [Internet]. 2012. Available from: http://person.hst.aau.dk/ska/mie2012/AllPresentations/422.pdf.
- 40.Mower J, Subramanian D, Cohen T. Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications. J Am Med Inform Assoc [Internet]. 2018 [cited 2018 Sep 26]; Available from: https://academic.oup.com/jamia/advance-article/doi/10.1093/jamia/ocy077/5052182.
- 41.Justin Mower. Compositional relation-based learning (CoRL): a general-purpose method to leverage literature-derived relationships applied to pharmacovigilance. Houston: Baylor College of Medicine; 2018.Google Scholar
- 43.Cohen T, Widdows D, Schvaneveldt R, Rindflesch TC. Finding schizophrenia’s prozac emergent relational similarity in predication space. Quantum Interact [Internet]. Berlin: Springer; 2011 [cited 2017 Oct 12]. p. 48–59. Available from: https://link.springer.com/chapter/10.1007/978-3-642-24971-6_6.Google Scholar
- 44.Cohen T, Widdows D, De Vine L, Schvaneveldt R, Rindflesch TC. Many paths lead to discovery: analogical retrieval of cancer therapies. Int Symp Quantum Interact. Springer; 2012. p. 90–101.Google Scholar
- 45.Cohen T, Widdows D, Stephan C, Zinner R, Kim J, Rindflesch T, et al. Predicting high-throughput screening results with scalable literature-based discovery methods. CPT Pharmacomet Syst Pharmacol. 2014;3:1–9.Google Scholar
- 47.Kilicoglu H, Shin D, Fiszman M, Rosemblat G, Rindflesch TC. SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinform Oxf Engl. 2012;28:3158–60.Google Scholar
- 48.Widdows D, Ferraro K. Semantic vectors: a scalable open source package and online technology management application. Citeseer: LREC; 2008.Google Scholar
- 49.Widdows D, Cohen T. The semantic vectors package: new algorithms and public tools for distributional semantics. In: 2010 IEEE Fourth Int Conf Semantic Comput. IEEE; 2010. p. 9–15.Google Scholar
- 50.Semantic Vectors [Internet]. 2019 [cited 2019 Jun 10]. Available from: https://github.com/semanticvectors/semanticvectors.
- 51.Ryan PB, Schuemie MJ, Welebob E, Duke J, Valentine S, Hartzema AG. Defining a reference set to support methodological research in drug safety. Drug Saf. 2013;36:33–47.Google Scholar
- 55.Oracle Health Sciences. Empirica Signal [Internet]. Oracle; Available from: http://www.oracle.com/us/products/applications/health-sciences/safety/empirica-signal/index.html.
- 56.Continuum Analytics. Anaconda Python Distribution [Internet]. Available from: https://www.anaconda.com/.
- 57.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.Google Scholar
- 58.Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9:90–5.Google Scholar
- 59.Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M, Frederic J, et al. Jupyter Notebooks-a publishing format for reproducible computational workflows. ELPUB. 2016. p. 87–90.Google Scholar