Drug Safety

, Volume 43, Issue 1, pp 67–77 | Cite as

Complementing Observational Signals with Literature-Derived Distributed Representations for Post-Marketing Drug Surveillance

  • Justin MowerEmail author
  • Trevor Cohen
  • Devika Subramanian
Original Research Article



As a result of the well documented limitations of data collected by spontaneous reporting systems (SRS), such as bias and under-reporting, a number of authors have evaluated the utility of other data sources for the purpose of pharmacovigilance, including the biomedical literature. Previous work has demonstrated the utility of literature-derived distributed representations (concept embeddings) with machine learning for the purpose of drug side-effect prediction. In terms of data sources, these methods are complementary, observing drug safety from two different perspectives (knowledge extracted from the literature and statistics from SRS data). However, the combined utility of these pharmacovigilance methods has yet to be evaluated.


This research investigates the utility of directly or indirectly combining an observational signal from SRS with literature-derived distributed representations into a single feature vector or in an ensemble approach for downstream machine learning (logistic regression).


Leveraging a recently developed representation scheme, concept embeddings were generated from relational connections extracted from the literature and composed to represent drug and associated adverse reactions, as defined by two reference standards of positive (likely causal) and negative (no causal evidence) pairs. Embeddings were presented with and without common measures of observational signal from SRS sources to logistic regressors, and performance was evaluated with the receiver operating characteristic (ROC) area under the curve (AUC) metric.


ROC AUC performance with these composite models improves up to ≈ 20% over SRS-based disproportionality metrics alone and exceeds the best prior results reported in the literature when models leverage both sources of information.


Results from this study support the hypothesis that knowledge extracted from the literature can enhance the performance of SRS-based methods (and vice versa). Across reference sets, using literature and SRS information together performed better than using either source alone, providing strong support for the complementary nature of these approaches to post-marketing drug surveillance.


Compliance with Ethical Standards


This work was supported by a US National Library of Medicine Grant (R01 LM011563).

Conflict of interest

Justin Mower, Trevor Cohen, and Devika Subramanian have no conflicts of interest relevant to the content of this study.

Supplementary material

40264_2019_872_MOESM1_ESM.docx (725 kb)
Supplementary material 1 (DOCX 725 kb)


  1. 1.
    National Center for Health Statistics. Health, United States, 2016: With Chartbook on Long-term Trends in Health [Internet]. Hyattsville; 2017. Available from:
  2. 2.
    Center for Disease Control and Prevention. National Hospital Ambulatory Medical Care Survey: 2011 Outpatient Department Summary Tables [Internet]. 2012. Available from:
  3. 3.
    Hing E, Rui P, Palso K. National Ambulatory Medical Care Survey: 2013 State and National Summary Tables [Internet]. 2014. Available from:
  4. 4.
    Rui P, Kang K, Albert M. National Hospital Ambulatory Medical Care Survey: 2013 Emergency Department Summary Tables [Internet]. 2014. Available from:
  5. 5.
    Stausberg J. International prevalence of adverse drug events in hospitals: an analysis of routine data from England, Germany, and the USA. BMC Health Serv Res. 2014;14:125.PubMedPubMedCentralGoogle Scholar
  6. 6.
    Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA. 1998;279:1200–5.PubMedGoogle Scholar
  7. 7.
    Bourgeois FT, Shannon MW, Valim C, Mandl KD. Adverse drug events in the outpatient setting: an 11-year national analysis. Pharmacoepidemiol Drug Saf. 2010;19:901–10.PubMedPubMedCentralGoogle Scholar
  8. 8.
    Watanabe JH, McInnis T, Hirsch JD. Cost of prescription drug-related morbidity and mortality. Ann Pharmacother. 2018;52:829–37.PubMedGoogle Scholar
  9. 9.
    Classen DC, Pestotnik SL, Evans RS, Lloyd JF, Burke JP. Adverse drug events in hospitalized patients. Excess length of stay, extra costs, and attributable mortality. JAMA. 1997;277:301–6.PubMedGoogle Scholar
  10. 10.
    Downing NS, Shah ND, Aminawung JA, Pease AM, Zeitoun J-D, Krumholz HM, et al. Postmarket safety events among novel therapeutics approved by the US Food and Drug Administration between 2001 and 2010. JAMA. 2017;317:1854–63.PubMedPubMedCentralGoogle Scholar
  11. 11.
    World Health Organization. The importance of pharmacovigilance. 2002; Available from:
  12. 12.
    Pariente A, Gregoire F, Fourrier-Reglat A, Haramburu F, Moore N. Impact of safety alerts on measures of disproportionality in spontaneous reporting databases: the notoriety bias. Drug Saf. 2007;30:891–8.PubMedGoogle Scholar
  13. 13.
    Stephenson WP, Hauben M. Data mining for signals in spontaneous reporting databases: proceed with caution. Pharmacoepidemiol Drug Saf. 2007;16:359–65.PubMedGoogle Scholar
  14. 14.
    Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, et al. Bayesian methods in pharmacovigilance. Oxf Univ Press. 2011;23:29.Google Scholar
  15. 15.
    Center for Drug Evaluation and Research. FDA Adverse Events Reporting System (FAERS)—Reports Received and Reports Entered into FAERS by Year [Internet]. [cited 2017 Jul 16]. Available from:
  16. 16.
    Evans SJ, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf. 2001;10:483–6.PubMedGoogle Scholar
  17. 17.
    Li Y, Ryan PB, Wei Y, Friedman C. A method to combine signals from spontaneous reporting systems and observational healthcare data to detect adverse drug reactions. Drug Saf. 2015;38:895–908.PubMedPubMedCentralGoogle Scholar
  18. 18.
    Bate A, Evans SJW. Quantitative signal detection using spontaneous ADR reporting. Pharmacoepidemiol Drug Saf. 2009;18:427–36.PubMedGoogle Scholar
  19. 19.
    Rothman KJ, Lanes S, Sacks ST. The reporting odds ratio and its advantages over the proportional reporting ratio. Pharmacoepidemiol Drug Saf. 2004;13:519–23.PubMedGoogle Scholar
  20. 20.
    Meyboom RH, Hekster YA, Egberts AC, Gribnau FW, Edwards IR. Causal or casual? The role of causality assessment in pharmacovigilance. Drug Saf. 1997;17:374–89.PubMedGoogle Scholar
  21. 21.
    Naidu RP. Causality assessment: a brief insight into practices in pharmaceutical industry. Perspect Clin Res. 2013;4:233–6.PubMedPubMedCentralGoogle Scholar
  22. 22.
    Center for Drug Evaluation and Research. Questions and Answers on FDA’s Adverse Event Reporting System (FAERS) [Internet]. 2016 [cited 2017 Jul 19]. Available from:
  23. 23.
    Voss EA, Boyce RD, Ryan PB, van der Lei J, Rijnbeek PR, Schuemie MJ. Accuracy of an automated knowledge base for identifying drug adverse reactions. J Biomed Inform. 2017;66:72–81.PubMedGoogle Scholar
  24. 24.
    Winnenburg R, Sorbello A, Ripple A, Harpaz R, Tonning J, Szarfman A, et al. Leveraging MEDLINE indexing for pharmacovigilance—inherent limitations and mitigation strategies. J Biomed Inform. 2015;100:425–35.Google Scholar
  25. 25.
    Collaborative TKB workgroup of the OHDS and I (OHDSI). Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data. J Biomed Semant. 2017;8:11.Google Scholar
  26. 26.
    Winnenburg R, Shah NH. Generalized enrichment analysis improves the detection of adverse drug events from the biomedical literature. BMC Bioinform. 2016;17:250.Google Scholar
  27. 27.
    Harpaz R, Callahan A, Tamang S, Low Y, Odgers D, Finlayson S, et al. Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf. 2014;37:777–90.PubMedPubMedCentralGoogle Scholar
  28. 28.
    Xu R, Wang Q. Large-scale combining signals from both biomedical literature and the FDA adverse event reporting system (FAERS) to improve post-marketing drug safety signal detection. BMC Bioinform. 2014;15:17.Google Scholar
  29. 29.
    Ahlers CB, Hristovski D, Kilicoglu H, Rindflesch TC. Using the literature-based discovery paradigm to investigate drug mechanisms. AMIA Annu Symp Proc. 2007;2007:6–10.PubMedCentralGoogle Scholar
  30. 30.
    Gordon MD, Dumais S. Using latent semantic indexing for literature based discovery. 1998. Available from:
  31. 31.
    Henry S, McInnes BT. Literature based discovery: models, methods, and trends. J Biomed Inform. 2017;74:20–32.PubMedGoogle Scholar
  32. 32.
    Hristovski D, Friedman C, Rindflesch TC, Peterlin B. Exploiting semantic relations for literature-based discovery. AMIA Annu Symp Proc. 2006;2006:349.PubMedCentralGoogle Scholar
  33. 33.
    Smalheiser NR. Literature-based discovery: beyond the ABCs. J Am Soc Inf Sci Technol. 2012;63:218–24.Google Scholar
  34. 34.
    Swanson DR. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med. 1986;30:7–18.PubMedGoogle Scholar
  35. 35.
    Swanson DR, Smalheiser NR. undiscovered public knowledge: a ten-year update. KDD [Internet]. 1996. p. 295–298. Available from:
  36. 36.
    Hristovski D, Burgun-Parenthoine A, Avillach P, Rindflesch TC. Towards using literature-based discovery to explain drug adverse effects. 24th Int Conf Eur Fed Med Inform Qual Life Qual Inf MIE [Internet]. 2012. Available from:
  37. 37.
    Shang N, Xu H, Rindflesch TC, Cohen T. Identifying plausible adverse drug reactions using knowledge extracted from the literature. J Biomed Inform. 2014;52:293–310.PubMedPubMedCentralGoogle Scholar
  38. 38.
    Cohen T, Widdows D. Embedding of semantic predications. J Biomed Inform. 2017;68:150–66.PubMedPubMedCentralGoogle Scholar
  39. 39.
    Mower J, Subramanian D, Shang N, Cohen T. Classification-by-analogy: using vector representations of implicit relationships to identify plausibly causal drug/side-effect relationships. AMIA Annu Symp Proc. 2017;2016:1940–9.PubMedPubMedCentralGoogle Scholar
  40. 40.
    Mower J, Subramanian D, Cohen T. Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications. J Am Med Inform Assoc [Internet]. 2018 [cited 2018 Sep 26]; Available from:
  41. 41.
    Justin Mower. Compositional relation-based learning (CoRL): a general-purpose method to leverage literature-derived relationships applied to pharmacovigilance. Houston: Baylor College of Medicine; 2018.Google Scholar
  42. 42.
    Cohen T, Widdows D, Schvaneveldt RW, Davies P, Rindflesch TC. Discovering discovery patterns with predication-based semantic indexing. J Biomed Inform. 2012;45:1049–65.PubMedPubMedCentralGoogle Scholar
  43. 43.
    Cohen T, Widdows D, Schvaneveldt R, Rindflesch TC. Finding schizophrenia’s prozac emergent relational similarity in predication space. Quantum Interact [Internet]. Berlin: Springer; 2011 [cited 2017 Oct 12]. p. 48–59. Available from: Scholar
  44. 44.
    Cohen T, Widdows D, De Vine L, Schvaneveldt R, Rindflesch TC. Many paths lead to discovery: analogical retrieval of cancer therapies. Int Symp Quantum Interact. Springer; 2012. p. 90–101.Google Scholar
  45. 45.
    Cohen T, Widdows D, Stephan C, Zinner R, Kim J, Rindflesch T, et al. Predicting high-throughput screening results with scalable literature-based discovery methods. CPT Pharmacomet Syst Pharmacol. 2014;3:1–9.Google Scholar
  46. 46.
    Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform. 2003;36:462–77.PubMedGoogle Scholar
  47. 47.
    Kilicoglu H, Shin D, Fiszman M, Rosemblat G, Rindflesch TC. SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinform Oxf Engl. 2012;28:3158–60.Google Scholar
  48. 48.
    Widdows D, Ferraro K. Semantic vectors: a scalable open source package and online technology management application. Citeseer: LREC; 2008.Google Scholar
  49. 49.
    Widdows D, Cohen T. The semantic vectors package: new algorithms and public tools for distributional semantics. In: 2010 IEEE Fourth Int Conf Semantic Comput. IEEE; 2010. p. 9–15.Google Scholar
  50. 50.
    Semantic Vectors [Internet]. 2019 [cited 2019 Jun 10]. Available from:
  51. 51.
    Ryan PB, Schuemie MJ, Welebob E, Duke J, Valentine S, Hartzema AG. Defining a reference set to support methodological research in drug safety. Drug Saf. 2013;36:33–47.Google Scholar
  52. 52.
    Harpaz R, DuMouchel W, LePendu P, Bauer-Mehren A, Ryan P, Shah NH. Performance of pharmacovigilance signal-detection algorithms for the FDA adverse event reporting system. Clin Pharmacol Ther. 2013;93:539–46.PubMedGoogle Scholar
  53. 53.
    Coloma PM, Avillach P, Salvo F, Schuemie MJ, Ferrajolo C, Pariente A, et al. A reference standard for evaluation of methods for drug safety signal detection using electronic healthcare record databases. Drug Saf. 2013;36:13–23.PubMedGoogle Scholar
  54. 54.
    Banda JM, Evans L, Vanguri RS, Tatonetti NP, Ryan PB, Shah NH. A curated and standardized adverse drug event resource to accelerate drug safety research. Sci Data. 2016;3:160026.PubMedPubMedCentralGoogle Scholar
  55. 55.
    Oracle Health Sciences. Empirica Signal [Internet]. Oracle; Available from:
  56. 56.
    Continuum Analytics. Anaconda Python Distribution [Internet]. Available from:
  57. 57.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.Google Scholar
  58. 58.
    Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9:90–5.Google Scholar
  59. 59.
    Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M, Frederic J, et al. Jupyter Notebooks-a publishing format for reproducible computational workflows. ELPUB. 2016. p. 87–90.Google Scholar
  60. 60.
    Harpaz R, DuMouchel W, Schuemie M, Bodenreider O, Friedman C, Horvitz E, et al. Toward multimodal signal detection of adverse drug reactions. J Biomed Inform. 2017;76:41–9.PubMedGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceRice UniversityHoustonUSA
  2. 2.University of Washington, Biomedical Informatics and Medical EducationSeattleUSA

Personalised recommendations