Use of Screening Algorithms and Computer Systems to Efficiently Signal Higher-Than-Expected Combinations of Drugs and Events in the US FDA’s Spontaneous Reports Database
Since 1998, the US Food and Drug Administration (FDA) has been exploring new automated and rapid Bayesian data mining techniques. These techniques have been used to systematically screen the FDA’s huge MedWatch database of voluntary reports of adverse drug events for possible events of concern.
The data mining method currently being used is the Multi-Item Gamma Poisson Shrinker (MGPS) program that replaced the Gamma Poisson Shrinker (GPS) program we originally used with the legacy database. The MGPS algorithm, the technical aspects of which are summarised in this paper, computes signal scores for pairs, and for higher-order (e.g. triplet, quadruplet) combinations of drugs and events that are significantly more frequent than their pair-wise associations would predict. MGPS generates consistent, redundant, and replicable signals while minimising random patterns. Signals are generated without using external exposure data, adverse event background information, or medical information on adverse drug reactions. The MGPS interface streamlines multiple input-output processes that previously had been manually integrated. The system, however, cannot distinguish between already-known associations and new associations, so the reviewers must filter these events.
In addition to detecting possible serious single-drug adverse event problems, MGPS is currently being evaluated to detect possible synergistic interactions between drugs (drug interactions) and adverse events (syndromes), and to detect differences among subgroups defined by gender and by age, such as paediatrics and geriatrics.
In the current data, only 3.4% of all 1.2 million drug-event pairs ever reported (with frequencies ≥ 1) generate signals [lower 95% confidence interval limit of the adjusted ratios of the observed counts over expected (O/E) counts (denoted EB05) of ≥ 2]. The total frequency count that contributed to signals comprised 23% (2.4 million) of the total number, 10.4 million of drug-event pairs reported, greatly facilitating a more focused follow-up and evaluation.
The algorithm provides an objective, systematic view of the data alerting reviewers to critically important, new safety signals. The study of signals detected by current methods, signals stored in the Center for Drug Evaluation and Research’s Monitoring Adverse Reports Tracking System, and the signals regarding cerivastatin, a cholesterol-lowering drug voluntarily withdrawn from the market in August 2001, exemplify the potential of data mining to improve early signal detection. The operating characteristics of data mining in detecting early safety signals, exemplified by studying a drug recently well characterised by large clinical trials confirms our experience that the signals generated by data mining have high enough specificity to deserve further investigation. The application of these tools may ultimately improve usage recommendations.
KeywordsCerivastatin Event Code Label Event Spontaneous Report Database Gamma Poisson Shrinker
The datamining technology referred to in this article was developed with grants from the Office of Women’s Health and the Center of Drug Evaluation and Research of the Food and Drug Administration and from an ‘Unmet Needs’ Grant from the National Centers for Disease Control and Prevention, United States Department of Health & Human Services.
We thank William DuMouchel of AT&T for developing the empirical Bayes data mining algorithms that we are applying to frequency counts; David Fram of Lincoln Technologies, Inc, Jeremy Pool, Ilya Yunus, and Ava-Robin Cohen of PPD Informatics ™ for providing critical technical information development and implementation expertise; Diane Wysowski and Janos Bacsanyi from CDER for providing adverse event signals detected by current methods; Susan Ellenberg, Miles Braun, and Manette Niu from CBER, FDA and Henry Rolka from CDC for precious feedback and collaboration. We thank Phillip Perucci and Stacey Nichols from FDA for very valuable technical support.
- 1.Baum C, Kweder SL, Anello C. The spontaneous reporting system in the United States. In: Strom BL, editor. Pharmacoepidemiology. 2nd ed. New York; John Wiley & Sons, 1994: 125–37Google Scholar
- 4.DuMouchel W, Pregibon D. Empirical bayes screening for multi-item associations. Proceedings of the conference on knowledge discovery and data; 2001 Aug 26-29; San Diego (CA): ACM Press: 67–76Google Scholar
- 5.DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician 1999; 53: 177–90Google Scholar
- 6.O’Neill RT, Szarfman A. Discussion: Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician 1999; 53: 190–6Google Scholar
- 7.Louis TA, Shen W. Discussion: Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system by William DuMouchel. The American Statistician 1999; 53: 196–8Google Scholar
- 8.Madigan D. Discussion: Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system by William DuMouchel. The American Statistician 1999; 53: 198–200Google Scholar
- 9.DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. Reply. The American Statistician 1999; 53: 201–2Google Scholar
- 10.Szarfman A. Discussion: a report on the activities of the adverse events working groups: focus on improving the detection of rare but serious events. Proceedings of the Biopharmaceutical Section, 1999. Alexandria (VA): American Statistical Association: 12–4Google Scholar
- 11.Szarfman A. The application of bayesian data mining and graphic visualization tools to screen FDA’s spontaneous reporting system database. Proceedings of the Section on Bayesian Statistical Science, 2000. American Statistical Association, 2000: 67–71Google Scholar
- 12.Szarfman A, Talarico L, Levine JG. Analysis and risk assessment of hematological data from clinical trials: toxicology of the hematopoietic system. In: Sipes IG, McQueen CA, Gandolfi AJ. Comprehensive toxicology. Vol. 4. New York; Elsevier Science Inc.: 1997: 363–79Google Scholar
- 13.Levine JG, Szarfman A. Standardised data structures and visualisation tools: a way to accelerate the regulatory review of the integrated summary of safety of new drug applications. Biopharmaceutical Report 1996; 4(3): 12–7Google Scholar
- 14.Video Clips. Workshop on datamining with applications in genomics, clinical trials and post-marketing drug risk. Schering-Plough Workshop 2000–2001. Harvard School of Public Health. Available from URL: http://www.biostat.harvard.edu/events/schering-plough/old/agenda2000-01.html [Accessed 2002 May]
- 15.ftp://ftp.research.att.com/dist/gps [Accessed 2002 May]
- 16.US Department of Commerce National Technical Information Service (NTIS), http://www.ntis.gov [Accessed 2002 May]
- 17.Rolka H, Barker L, Cadwel B, et al. Data mining for post-licensure vaccine safety and policy implications for using results. 2001 Proceedings of the Section on Health Policy Statistics, American Statistical Association. In pressGoogle Scholar
- 20.FDA Talk paper. Bayer voluntarily withdraws baycol. FDA talk paper no. T01-34. 2001 Aug 8Google Scholar