Skip to main content
Log in

Complementing Observational Signals with Literature-Derived Distributed Representations for Post-Marketing Drug Surveillance

  • Original Research Article
  • Published:
Drug Safety Aims and scope Submit manuscript

Abstract

Introduction

As a result of the well documented limitations of data collected by spontaneous reporting systems (SRS), such as bias and under-reporting, a number of authors have evaluated the utility of other data sources for the purpose of pharmacovigilance, including the biomedical literature. Previous work has demonstrated the utility of literature-derived distributed representations (concept embeddings) with machine learning for the purpose of drug side-effect prediction. In terms of data sources, these methods are complementary, observing drug safety from two different perspectives (knowledge extracted from the literature and statistics from SRS data). However, the combined utility of these pharmacovigilance methods has yet to be evaluated.

Objective

This research investigates the utility of directly or indirectly combining an observational signal from SRS with literature-derived distributed representations into a single feature vector or in an ensemble approach for downstream machine learning (logistic regression).

Methods

Leveraging a recently developed representation scheme, concept embeddings were generated from relational connections extracted from the literature and composed to represent drug and associated adverse reactions, as defined by two reference standards of positive (likely causal) and negative (no causal evidence) pairs. Embeddings were presented with and without common measures of observational signal from SRS sources to logistic regressors, and performance was evaluated with the receiver operating characteristic (ROC) area under the curve (AUC) metric.

Results

ROC AUC performance with these composite models improves up to ≈ 20% over SRS-based disproportionality metrics alone and exceeds the best prior results reported in the literature when models leverage both sources of information.

Conclusions

Results from this study support the hypothesis that knowledge extracted from the literature can enhance the performance of SRS-based methods (and vice versa). Across reference sets, using literature and SRS information together performed better than using either source alone, providing strong support for the complementary nature of these approaches to post-marketing drug surveillance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Sharing

Data and code used in this study is available at: https://github.com/jusger/FAERLitComplement.

Notes

  1. Further information on the reference standards used in this study can be found in Sect. 2.2.

  2. Further information on the disproportionality measures used in this study can be found in Sect. 2.3.

  3. The Banda resource does not report the EBGM measure.

References

  1. National Center for Health Statistics. Health, United States, 2016: With Chartbook on Long-term Trends in Health [Internet]. Hyattsville; 2017. Available from: https://www.cdc.gov/nchs/data/hus/hus16.pdf.

  2. Center for Disease Control and Prevention. National Hospital Ambulatory Medical Care Survey: 2011 Outpatient Department Summary Tables [Internet]. 2012. Available from: https://www.cdc.gov/nchs/data/ahcd/nhamcs_outpatient/2011_opd_web_tables.pdf.

  3. Hing E, Rui P, Palso K. National Ambulatory Medical Care Survey: 2013 State and National Summary Tables [Internet]. 2014. Available from: http://www.cdc.gov/nchs/ahcd/ahcd_products.htm.

  4. Rui P, Kang K, Albert M. National Hospital Ambulatory Medical Care Survey: 2013 Emergency Department Summary Tables [Internet]. 2014. Available from: http://www.cdc.gov/nchs/data/ahcd/nhamcs_emergency/2013_ed_web_tables.pdf.

  5. Stausberg J. International prevalence of adverse drug events in hospitals: an analysis of routine data from England, Germany, and the USA. BMC Health Serv Res. 2014;14:125.

    PubMed  PubMed Central  Google Scholar 

  6. Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. JAMA. 1998;279:1200–5.

    CAS  PubMed  Google Scholar 

  7. Bourgeois FT, Shannon MW, Valim C, Mandl KD. Adverse drug events in the outpatient setting: an 11-year national analysis. Pharmacoepidemiol Drug Saf. 2010;19:901–10.

    PubMed  PubMed Central  Google Scholar 

  8. Watanabe JH, McInnis T, Hirsch JD. Cost of prescription drug-related morbidity and mortality. Ann Pharmacother. 2018;52:829–37.

    PubMed  Google Scholar 

  9. Classen DC, Pestotnik SL, Evans RS, Lloyd JF, Burke JP. Adverse drug events in hospitalized patients. Excess length of stay, extra costs, and attributable mortality. JAMA. 1997;277:301–6.

    CAS  PubMed  Google Scholar 

  10. Downing NS, Shah ND, Aminawung JA, Pease AM, Zeitoun J-D, Krumholz HM, et al. Postmarket safety events among novel therapeutics approved by the US Food and Drug Administration between 2001 and 2010. JAMA. 2017;317:1854–63.

    PubMed  PubMed Central  Google Scholar 

  11. World Health Organization. The importance of pharmacovigilance. 2002; Available from: http://apps.who.int/iris/bitstream/10665/42493/1/a75646.pdf.

  12. Pariente A, Gregoire F, Fourrier-Reglat A, Haramburu F, Moore N. Impact of safety alerts on measures of disproportionality in spontaneous reporting databases: the notoriety bias. Drug Saf. 2007;30:891–8.

    PubMed  Google Scholar 

  13. Stephenson WP, Hauben M. Data mining for signals in spontaneous reporting databases: proceed with caution. Pharmacoepidemiol Drug Saf. 2007;16:359–65.

    PubMed  Google Scholar 

  14. Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, et al. Bayesian methods in pharmacovigilance. Oxf Univ Press. 2011;23:29.

    Google Scholar 

  15. Center for Drug Evaluation and Research. FDA Adverse Events Reporting System (FAERS)—Reports Received and Reports Entered into FAERS by Year [Internet]. [cited 2017 Jul 16]. Available from: https://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm070434.htm.

  16. Evans SJ, Waller PC, Davis S. Use of proportional reporting ratios (PRRs) for signal generation from spontaneous adverse drug reaction reports. Pharmacoepidemiol Drug Saf. 2001;10:483–6.

    CAS  PubMed  Google Scholar 

  17. Li Y, Ryan PB, Wei Y, Friedman C. A method to combine signals from spontaneous reporting systems and observational healthcare data to detect adverse drug reactions. Drug Saf. 2015;38:895–908.

    PubMed  PubMed Central  Google Scholar 

  18. Bate A, Evans SJW. Quantitative signal detection using spontaneous ADR reporting. Pharmacoepidemiol Drug Saf. 2009;18:427–36.

    CAS  PubMed  Google Scholar 

  19. Rothman KJ, Lanes S, Sacks ST. The reporting odds ratio and its advantages over the proportional reporting ratio. Pharmacoepidemiol Drug Saf. 2004;13:519–23.

    PubMed  Google Scholar 

  20. Meyboom RH, Hekster YA, Egberts AC, Gribnau FW, Edwards IR. Causal or casual? The role of causality assessment in pharmacovigilance. Drug Saf. 1997;17:374–89.

    CAS  PubMed  Google Scholar 

  21. Naidu RP. Causality assessment: a brief insight into practices in pharmaceutical industry. Perspect Clin Res. 2013;4:233–6.

    PubMed  PubMed Central  Google Scholar 

  22. Center for Drug Evaluation and Research. Questions and Answers on FDA’s Adverse Event Reporting System (FAERS) [Internet]. 2016 [cited 2017 Jul 19]. Available from: https://www.fda.gov/drugs/guidancecomplianceregulatoryinformation/surveillance/adversedrugeffects/.

  23. Voss EA, Boyce RD, Ryan PB, van der Lei J, Rijnbeek PR, Schuemie MJ. Accuracy of an automated knowledge base for identifying drug adverse reactions. J Biomed Inform. 2017;66:72–81.

    CAS  PubMed  Google Scholar 

  24. Winnenburg R, Sorbello A, Ripple A, Harpaz R, Tonning J, Szarfman A, et al. Leveraging MEDLINE indexing for pharmacovigilance—inherent limitations and mitigation strategies. J Biomed Inform. 2015;100:425–35.

    Google Scholar 

  25. Collaborative TKB workgroup of the OHDS and I (OHDSI). Large-scale adverse effects related to treatment evidence standardization (LAERTES): an open scalable system for linking pharmacovigilance evidence sources with clinical data. J Biomed Semant. 2017;8:11.

  26. Winnenburg R, Shah NH. Generalized enrichment analysis improves the detection of adverse drug events from the biomedical literature. BMC Bioinform. 2016;17:250.

    Google Scholar 

  27. Harpaz R, Callahan A, Tamang S, Low Y, Odgers D, Finlayson S, et al. Text mining for adverse drug events: the promise, challenges, and state of the art. Drug Saf. 2014;37:777–90.

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Xu R, Wang Q. Large-scale combining signals from both biomedical literature and the FDA adverse event reporting system (FAERS) to improve post-marketing drug safety signal detection. BMC Bioinform. 2014;15:17.

    CAS  Google Scholar 

  29. Ahlers CB, Hristovski D, Kilicoglu H, Rindflesch TC. Using the literature-based discovery paradigm to investigate drug mechanisms. AMIA Annu Symp Proc. 2007;2007:6–10.

    PubMed Central  Google Scholar 

  30. Gordon MD, Dumais S. Using latent semantic indexing for literature based discovery. 1998. Available from: https://deepblue.lib.umich.edu/handle/2027.42/34255.

  31. Henry S, McInnes BT. Literature based discovery: models, methods, and trends. J Biomed Inform. 2017;74:20–32.

    PubMed  Google Scholar 

  32. Hristovski D, Friedman C, Rindflesch TC, Peterlin B. Exploiting semantic relations for literature-based discovery. AMIA Annu Symp Proc. 2006;2006:349.

    PubMed Central  Google Scholar 

  33. Smalheiser NR. Literature-based discovery: beyond the ABCs. J Am Soc Inf Sci Technol. 2012;63:218–24.

    Google Scholar 

  34. Swanson DR. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med. 1986;30:7–18.

    CAS  PubMed  Google Scholar 

  35. Swanson DR, Smalheiser NR. undiscovered public knowledge: a ten-year update. KDD [Internet]. 1996. p. 295–298. Available from: https://ocs.aaai.org/Papers/KDD/1996/KDD96-051.pdf.

  36. Hristovski D, Burgun-Parenthoine A, Avillach P, Rindflesch TC. Towards using literature-based discovery to explain drug adverse effects. 24th Int Conf Eur Fed Med Inform Qual Life Qual Inf MIE [Internet]. 2012. Available from: http://person.hst.aau.dk/ska/mie2012/AllPresentations/422.pdf.

  37. Shang N, Xu H, Rindflesch TC, Cohen T. Identifying plausible adverse drug reactions using knowledge extracted from the literature. J Biomed Inform. 2014;52:293–310.

    PubMed  PubMed Central  Google Scholar 

  38. Cohen T, Widdows D. Embedding of semantic predications. J Biomed Inform. 2017;68:150–66.

    PubMed  PubMed Central  Google Scholar 

  39. Mower J, Subramanian D, Shang N, Cohen T. Classification-by-analogy: using vector representations of implicit relationships to identify plausibly causal drug/side-effect relationships. AMIA Annu Symp Proc. 2017;2016:1940–9.

    PubMed  PubMed Central  Google Scholar 

  40. Mower J, Subramanian D, Cohen T. Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications. J Am Med Inform Assoc [Internet]. 2018 [cited 2018 Sep 26]; Available from: https://academic.oup.com/jamia/advance-article/doi/10.1093/jamia/ocy077/5052182.

  41. Justin Mower. Compositional relation-based learning (CoRL): a general-purpose method to leverage literature-derived relationships applied to pharmacovigilance. Houston: Baylor College of Medicine; 2018.

  42. Cohen T, Widdows D, Schvaneveldt RW, Davies P, Rindflesch TC. Discovering discovery patterns with predication-based semantic indexing. J Biomed Inform. 2012;45:1049–65.

    PubMed  PubMed Central  Google Scholar 

  43. Cohen T, Widdows D, Schvaneveldt R, Rindflesch TC. Finding schizophrenia’s prozac emergent relational similarity in predication space. Quantum Interact [Internet]. Berlin: Springer; 2011 [cited 2017 Oct 12]. p. 48–59. Available from: https://link.springer.com/chapter/10.1007/978-3-642-24971-6_6.

    Google Scholar 

  44. Cohen T, Widdows D, De Vine L, Schvaneveldt R, Rindflesch TC. Many paths lead to discovery: analogical retrieval of cancer therapies. Int Symp Quantum Interact. Springer; 2012. p. 90–101.

  45. Cohen T, Widdows D, Stephan C, Zinner R, Kim J, Rindflesch T, et al. Predicting high-throughput screening results with scalable literature-based discovery methods. CPT Pharmacomet Syst Pharmacol. 2014;3:1–9.

    Google Scholar 

  46. Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J Biomed Inform. 2003;36:462–77.

    PubMed  Google Scholar 

  47. Kilicoglu H, Shin D, Fiszman M, Rosemblat G, Rindflesch TC. SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinform Oxf Engl. 2012;28:3158–60.

    CAS  Google Scholar 

  48. Widdows D, Ferraro K. Semantic vectors: a scalable open source package and online technology management application. Citeseer: LREC; 2008.

    Google Scholar 

  49. Widdows D, Cohen T. The semantic vectors package: new algorithms and public tools for distributional semantics. In: 2010 IEEE Fourth Int Conf Semantic Comput. IEEE; 2010. p. 9–15.

  50. Semantic Vectors [Internet]. 2019 [cited 2019 Jun 10]. Available from: https://github.com/semanticvectors/semanticvectors.

  51. Ryan PB, Schuemie MJ, Welebob E, Duke J, Valentine S, Hartzema AG. Defining a reference set to support methodological research in drug safety. Drug Saf. 2013;36:33–47.

    Google Scholar 

  52. Harpaz R, DuMouchel W, LePendu P, Bauer-Mehren A, Ryan P, Shah NH. Performance of pharmacovigilance signal-detection algorithms for the FDA adverse event reporting system. Clin Pharmacol Ther. 2013;93:539–46.

    CAS  PubMed  Google Scholar 

  53. Coloma PM, Avillach P, Salvo F, Schuemie MJ, Ferrajolo C, Pariente A, et al. A reference standard for evaluation of methods for drug safety signal detection using electronic healthcare record databases. Drug Saf. 2013;36:13–23.

    CAS  PubMed  Google Scholar 

  54. Banda JM, Evans L, Vanguri RS, Tatonetti NP, Ryan PB, Shah NH. A curated and standardized adverse drug event resource to accelerate drug safety research. Sci Data. 2016;3:160026.

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Oracle Health Sciences. Empirica Signal [Internet]. Oracle; Available from: http://www.oracle.com/us/products/applications/health-sciences/safety/empirica-signal/index.html.

  56. Continuum Analytics. Anaconda Python Distribution [Internet]. Available from: https://www.anaconda.com/.

  57. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

    Google Scholar 

  58. Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9:90–5.

    Google Scholar 

  59. Kluyver T, Ragan-Kelley B, Pérez F, Granger BE, Bussonnier M, Frederic J, et al. Jupyter Notebooks-a publishing format for reproducible computational workflows. ELPUB. 2016. p. 87–90.

  60. Harpaz R, DuMouchel W, Schuemie M, Bodenreider O, Friedman C, Horvitz E, et al. Toward multimodal signal detection of adverse drug reactions. J Biomed Inform. 2017;76:41–9.

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Justin Mower.

Ethics declarations

Funding

This work was supported by a US National Library of Medicine Grant (R01 LM011563).

Conflict of interest

Justin Mower, Trevor Cohen, and Devika Subramanian have no conflicts of interest relevant to the content of this study.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 725 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mower, J., Cohen, T. & Subramanian, D. Complementing Observational Signals with Literature-Derived Distributed Representations for Post-Marketing Drug Surveillance. Drug Saf 43, 67–77 (2020). https://doi.org/10.1007/s40264-019-00872-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40264-019-00872-9

Navigation