Skip to main content
Log in

Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems

  • Original Research Article
  • Published:
Drug Safety Aims and scope Submit manuscript

A Commentary to this article was published on 30 May 2017

Abstract

Introduction

Duplicate case reports in spontaneous adverse event reporting systems pose a challenge for medical reviewers to efficiently perform individual and aggregate safety analyses. Duplicate cases can bias data mining by generating spurious signals of disproportional reporting of product-adverse event pairs.

Objective

We have developed a probabilistic record linkage algorithm for identifying duplicate cases in the US Vaccine Adverse Event Reporting System (VAERS) and the US Food and Drug Administration Adverse Event Reporting System (FAERS).

Methods

In addition to using structured field data, the algorithm incorporates the non-structured narrative text of adverse event reports by examining clinical and temporal information extracted by the Event-based Text-mining of Health Electronic Records system, a natural language processing tool. The final component of the algorithm is a novel duplicate confidence value that is calculated by a rule-based empirical approach that looks for similarities in a number of criteria between two case reports.

Results

For VAERS, the algorithm identified 77% of known duplicate pairs with a precision (or positive predictive value) of 95%. For FAERS, it identified 13% of known duplicate pairs with a precision of 100%. The textual information did not improve the algorithm’s automated classification for VAERS or FAERS. The empirical duplicate confidence value increased performance on both VAERS and FAERS, mainly by reducing the occurrence of false-positives.

Conclusions

The algorithm was shown to be effective at identifying pre-linked duplicate VAERS reports. The narrative text was not shown to be a key component in the automated detection evaluation; however, it is essential for supporting the semi-automated approach that is likely to be deployed at the Food and Drug Administration, where medical reviewers will perform some manual review of the most highly ranked reports identified by the algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Available from http://plagiarism.bloomfieldmedia.com/wordpress/software/copyfind/.

References

  1. World Health Organization. A guide to detecting and reporting adverse drug reactions. Geneva: World Health Organization; 2002: Contract No. WHO/EDM/QSM/2002.2.

  2. Hazell L, Shakir SAW. Under-reporting of adverse drug reactions. Drug Saf. 2006;29(5):385–96. doi:10.2165/00002018-200629050-00003.

    Article  PubMed  Google Scholar 

  3. Moore TJ, Furber CD, Mattison DR, et al. A critique of a key drug safety reporting system. QuarterWatch. Horsham, PA: Institute for Safe Medication Practices; 2015.

  4. Varricchio F, Iskander J, Destefano F, et al. Understanding vaccine safety information from the Vaccine Adverse Event Reporting System. Pediatr Infect Dis J. 2004;23(4):287–94.

    Article  PubMed  Google Scholar 

  5. US Food and Drug Administration. Reporting serious problems to FDA. 2016. http://www.fda.gov/Safety/MedWatch/HowToReport/default.htm. Accessed 9 June 2016.

  6. Guidance for industry: postmarketing safety reporting for human drug and biological products including vaccines. Rockville, MD: Food and Drug Administration; 2001.

  7. Hauben M, Reich L, De Micco J, Kim K. ‘Extreme duplication’ in the US FDA Adverse Events Reporting System Database. Drug Saf. 2007;30(6):551–4. doi:10.2165/00002018-200730060-00009.

    Article  PubMed  Google Scholar 

  8. Poluzzi E, Raschi E, Piccinni C, De F. Data mining techniques in pharmacovigilance: analysis of the publicly accessible FDA Adverse Event Reporting System (AERS). In: Karahoca A, editor. Data mining applications in engineering and medicine. Rijeka, Croatia: InTech; 2012.

    Google Scholar 

  9. Committee for medicinal products for human use guideline on detection and management of duplicate individual cases and individual case safety reports (ICSRs). London: European Medicines Agency; 2012.

  10. Tromp M, Ravelli AC, Bonsel GJ, et al. Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage. J Clin Epidemiol. 2011;64(5):565–72. doi:10.1016/j.jclinepi.2010.05.008.

    Article  PubMed  Google Scholar 

  11. Baldwin E, Johnson K, Berthoud H, Dublin S. Linking mothers and infants within electronic health records: a comparison of deterministic and probabilistic algorithms. Pharmacoepidemiol Drug Saf. 2015;24(1):45–51. doi:10.1002/pds.3728.

    Article  PubMed  Google Scholar 

  12. Fellegi IP, Sunter AB. A theory for record linkage. J Am Stat Assoc. 1969;64(328):1183–210. doi:10.1080/01621459.1969.10501049.

    Article  Google Scholar 

  13. Aldridge RW, Shaji K, Hayward AC, Abubakar I. Accuracy of probabilistic linkage using the enhanced matching system for public health and epidemiological studies. PLoS One. 2015;10(8):e0136179. doi:10.1371/journal.pone.0136179.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Tregunno PM, Fink DB, Fernandez-Fernandez C, et al. Performance of probabilistic method to detect duplicate individual case safety reports. Drug Saf. 2014;37(4):249–58. doi:10.1007/s40264-014-0146-y.

    Article  PubMed  Google Scholar 

  15. DuVall SL, Fraser AM, Rowe K, et al. Evaluation of record linkage between a large healthcare provider and the Utah Population Database. J Am Med Inform Assoc. 2012;19(e1):e54.

    Article  PubMed  Google Scholar 

  16. Norén GN, Orre R, Bate A, Edwards IR. Duplicate detection in adverse drug reaction surveillance. Data Min Knowl Discov. 2007;14(3):305–28. doi:10.1007/s10618-006-0052-8.

    Article  Google Scholar 

  17. Méray N, Reitsma JB, Ravelli ACJ, Bonsel GJ. Probabilistic record linkage is a valid and transparent tool to combine databases without a patient identification number. J Clin Epidemiol. 2007;60(9):883.e1–11. doi:10.1016/j.jclinepi.2006.11.021.

  18. Grannis SJ, Overhage JM, Hui S, McDonald CJ. Analysis of a probabilistic record linkage technique without human review. AMIA Annu Symp Proc. 2003;2003:259–63.

    PubMed Central  Google Scholar 

  19. Botsis T, Buttolph T, Nguyen MD, et al. Vaccine adverse event text mining system for extracting features from vaccine safety reports. J Am Med Inform Assoc. 2012;19(6):1011–8. doi:10.1136/amiajnl-2012-000881.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Wang W, Kreimeyer K, Woo EJ, et al. A new algorithmic approach for the extraction of temporal associations from clinical narratives with an application to medical product safety surveillance reports. J Biomed Inform. 2016;62:78–89. doi:10.1016/j.jbi.2016.06.006.

    Article  PubMed  Google Scholar 

  21. Botsis T, Jankosky C, Arya D, et al. Decision support environment for medical product safety surveillance. J Biomed Inform. 2016;64:354–62. doi:10.1016/j.jbi.2016.07.023.

    Article  PubMed  Google Scholar 

  22. DuVall SL, Kerber RA, Thomas A. Extending the Fellegi-Sunter probabilistic record linkage method for approximate field comparators. J Biomed Inform. 2010;43(1):24–30. doi:10.1016/j.jbi.2009.08.004.

    Article  PubMed  Google Scholar 

  23. Baer B, Nguyen M, Woo EJ, et al. Can natural language processing improve the efficiency of vaccine adverse event report review? Methods Inf Med. 2016;55(2):144–50. doi:10.3414/me14-01-0066.

    Article  CAS  PubMed  Google Scholar 

  24. van Rijsbergen CJ. Information retrieval. 2nd ed. Newton, MA: Butterworth-Heinemann; 1979.

    Google Scholar 

  25. Bilenko M, Mooney RJ. Adaptive duplicate detection using learnable string similarity measures. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge discovery and data mining, Washington, DC, 2003, 956759 ACM, p. 39–48.

Download references

Acknowledgements

The authors thank Ezekiel Maier for several conversations and suggestions that have enhanced the technical aspects of this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kory Kreimeyer.

Ethics declarations

Funding

This work was supported in part by the appointment of Kory Kreimeyer to the Research Participation Program administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the US Department of Energy and the US Food and Drug Administration.

Conflict of interest

Kory Kreimeyer, David Menschik, Scott Winiecki, Wendy Paul, Faith Barash, Emily Jane Woo, Meghna Alimchandani, Deepa Arya, Craig Zinderman, Richard Forshee, and Taxiarchis Botsis have no conflicts of interest directly relevant to the content of this article.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 29 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kreimeyer, K., Menschik, D., Winiecki, S. et al. Using Probabilistic Record Linkage of Structured and Unstructured Data to Identify Duplicate Cases in Spontaneous Adverse Event Reporting Systems. Drug Saf 40, 571–582 (2017). https://doi.org/10.1007/s40264-017-0523-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40264-017-0523-4

Keywords

Navigation